Shared Online Media Archive SOMA Metadata Element Set Version Author
Based on meeting held at CMA office in Sheffield
Based on comments on version 0.2, and the
Finalise vocabularies, deal with open issues,
Table of Contents Table of Contents . 2 Introduction. 4
Purpose of this document. 4 Brief Glossary. 4 Summary . 4
Comments and Open Issues. 5
Different Formats of a Resource . 5 Audience . 6 Extended Information. 6 Open Issues. 6 Data Changes. 6
Elements and Refinements. 7 Refinement: Title.Alternative . 7
Element: Subject . 9 Element: Description . 9 Element: Publisher . 9
Refinement: Publisher.URI .10Refinement: Publisher.Logo.10Refinement: Date.Created.12Refinement: Date.Available .12Refinement: Date.Issued .12Refinement: Date.Modified .13Refinement: Format.Extent .14Refinement: Format.Medium .14
Element: Identifier.15 Element: Language .15
Refinement: Relation.IsVersionOf, Relation.HasVersion .16Refinement: Relation.IsReplacedBy, Relation.Replaces .16Refinement: Relation.IsPartOf, Relation.HasPart.17Refinement: Relation.IsFormatOf, Relation.HasFormat .17Refinement: Coverage.Spatial .18Refinement: Coverage.Temporal .18
Element: Rights.19 Element: ExtendedInformation .19
Refinement: ExtendedInformation.Scheme .19References . 21 Appendix A: Vocabularies . 22
SOMA Genres .22 SOMA Roles .22 SOMA Media Types.23 SOMA Locations.24 SOMA Topics .24
Appendix B: XML Representation . 26
Dublin Core in XML and RDF .26 Organisation Names in Several Languages.26 Extent in Seconds and Octets .26
Appendix C: About This Document . 28
Stakeholders .28 Reviewing this Document .28 Versions of the SOMA Metadata .28
Introduction Purpose of this document
This documents describes a draft metadata format for exchange of metadata for multimedia files between members of the SOMA Group. It is based on Dublin Core 1.1 [1] as well as EBU
Brief Glossary SOMA providers. Organisations who publish data using the SOMA metadata standard. SOMA members. Members of the SOMA Group (who maintain and support the standard). Resource. An item being represented by the metadata. Examples include: a radio
programme, a film, a website, an excerpt from a radio programme. A resource may be
available in several different formats. For example, an audio file may be available in MP3 or Real Audio format, a website may be available in Flash or HTML format.
When preparing this metadata set, the following assumptions have been made:
The metadata will be used to create a shared media archive, searchable by the public
and by people working at community radio / TV stations.
The metadata set will be used to store information about a range of media files,
including audio, video, websites, and learning materials.
The audience may want to listen online or to download and rebroadcast.
The data set should be kept as simple as possible.
The data set should stay as close to EBU and DC as possible.
SOMA providers may store more information about a programme locally for their own
SOMA providers set their own level of access and security. For example, they may
require users to log in to their site before they can download a file.
The metadata set is not intended for the following purposes:
Information storage: SOMA providers will probably want to store more information
locally, or to store information in a different format.
Instant streaming of media files: organisations wishing to stream files found using
the SOMA metadata set will need to download, then upload them onto their own site
before they are able to stream them. They may find that, before downloading the
file, they need to visit several pages, and/or create an account on the system of the
Comments and Open Issues Different Formats of a Resource
Within the SOMA Metadata standard there are two acceptable ways of representing different formats of the same resource.
A. Repeat metadata, use IsFormatOf. Each format the resource is available in has
its own copy of the metadata set. The relationship between the metadata sets is indicated using Refinement 13.4 (IsFormatOf / HasFormat).
B. Link to a holding page. The SOMA provider creates a holding page with links to all
formats of the resource. There is one set of metadata, with the URI of the holding page as the identifier in Element 10. Element 9 (Format) is repeated for each format
What is meant by “a different format of the same resource”?
Files with the same content, but accessed using different software. For example,
Files with the same content, but of different quality. For example, mono / stereo,
Files with the same content, but made available on a different medium. For
example, an audio file / a transcript of an audio file, a video file / the soundtrack of a video file.
Audience
An Audience element has been added to Dublin Core. This information may be of interest to
people using the SOMA standard, but we have agreed not to include the element at this stage.
We will consider introducing this element in the future. We do not guarantee that the element will be introduced, as some members are concerned that this may encourage
censorship. We recommend that members record this information in their own systems
Extended Information
We will include a mechanism for storing extended information. This will be used when
organisations have private agreements to provide extra fields to be used by some other
initiative (for example, the proposed Stream on the Fly system). This is handled by the optional element: ExtendedInformation (see Elements and Refinements section for further
Open Issues Subject • Other controlled vocabularies. We are open to the idea of SOMA providers including
information from other controlled vocabularies in this section. Guidelines for such use
Include Creative Commons. When the Creative Commons standard becomes available, we will consider using it. A draft version of the Creative Commons licence in Listen / Download / View. We will look into including information about whether the
file is available for download and for listen/view online. We would like the information to be available in a structured format, so that it is possible to search for, e.g. audio
Data Changes Appendix A: Vocabularies. All vocabulary lists have changed. Language. Alternate encoding scheme added (allow W3C RFC 1766 [7]). Coverage.Temporal. Alternate encoding scheme added (allow W3C-Date and Time Format (DTF) [5]). Elements and Refinements Element: Title
Typically, a title will be a name by which the resource is formally
The title is tied to the archived item: for a series - use the series title,
for a programme - a programme title, for an item - an item title.
To differentiate between a series title and programme title when
these are identical, recommended best practise is to use a date along
with the programme title. For example, "News" is a series title; "News 2000.11.12" is a programme title. Where there is no natural
date, put a number in front of the title to show order of the series,
e.g. “1. Janet Moves to London”. Where a programme has been broken into several parts, the part number should be listed here. For
example, “Report from Rio, Part 1”, “Report from Rio, Part 2”.
Implementation guidance: When users are entering data into the
original system, the user interface should encourage them to follow best practice. It is recommended they enter title and episode/part
number/date separately. These should then be combined following
the guidelines above either when the data is saved locally, or when the data is exported to the SOMA format.
Refinement: Title.Alternative
Any form of the title used as a substitute or alternative to the formal
This qualifier can include title abbreviations as well as well as translations.
Only the main title may be translated. Abbreviations and alternative
titles should only be provided in the original language.
Element: Creator
An entity primarily responsible for making the content of the
Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.
We recommend that names be written in the following order:
surname, first name. However, they can also be written according to
We recommend that organisations do not use abbreviated forms of
their name, unless this is the name they are most commonly known
The difference between Contributor and Creator is sometimes difficult
to decide. If in doubt, use contributor. These two fields should be
considered equivalent for search purposes.
See Appendix B for notes about representing translations of
In plain XML, translations of organisation names should not be
Refinement: Creator.Role
Role the creator played in creating the resource. For example: producer, writer or editor.
There is no Creator.Role refinement in DC.
Variation from EBU: EBU uses EBU Reference Data Table: Roles in broadcasting [3] (not
Element: Subject
The topic of the content of the resource.
Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Persons as subjects are
also placed here. Genre of the content is placed under element Type.
Geographical locations and historical periods as subjects are placed under Coverage.
Where groups of organisations have similar interests, other controlled vocabularies may be used here.
Variation from EBU: Not using recommended encoding schemes
Element: Description
An account of the content of the resource.
Description may include but is not limited to: an abstract, a running
order, or a free-text account of the content.
None, but refinements (table of contents, abstract) aren’t used.
Element: Publisher
Organisation who originally made the resource available to SOMA.
We recommend that organisations do not use abbreviated forms of
their name, unless this is the name they are most commonly known by (e.g. OXFAM).
May use RDF construct “seeAlso” to point to a FOAF file (see Appendix
Refinement: Publisher.URI
May be used for linking to the homepage of the partner hosting the
Variation from EBU: No such refinement in EBU.
Refinement: Publisher.Logo Refinement 5.2:
May be used for linking to the homepage of partner hosting the file.
Variation from EBU: No such refinement in EBU.
Element: Contributor
An entity responsible for making contributions to the content of the
Examples of a Contributor include a person, an organisation, or a
service. Typically, the name of a Contributor should be used to
We recommend that names be written in the following order: surname, first name. However, they can also be written according to
We recommend that organisations do not use abbreviated forms of their name, unless this is the name they are most commonly known
The difference between Contributor and Creator is sometimes difficult to decide. If in doubt whether an entity is a creator or contributor use
the element contributor. These two fields should be considered
See Appendix B for notes about representing translations of organisation names in RDF.
In plain XML, translations of organisation names should not be
Refinement: Contributor.Role
Role the contributor played in creating the resource. For example: producer, writer or editor.
There is no Contributor.Role refinement in DC.
Variation from EBU: EBU uses EBU Reference Data Table: Roles in broadcasting [3] (not
Element: Date
A date associated with an event in the life cycle of the resource.
If unqualified, assume same value as the Date.Created refinement.
Approximate dates should be explained in Description element.
None, but not using the Date.Valid refinement.
Variation from EBU: None, but not using the Date.Digitized refinement.
Refinement: Date.Created
Date of creation of the content of the resource.
Approximate dates should be explained in Description element
Refinement: Date.Available
Date (often a range) that the resource will become or did become
Should show the date the publisher made the resource available. This will generally be the date the metadata was first published.
Refinement: Date.Issued
Date made available by original publisher. For example, the broadcasting date of a radio programme.
It is recommended best practise to use the element both for
recordings that are "born-digital" and recordings that are digitised.
Refinement: Date.Modified
Date on which the resource or metadata was last changed.
If the resource or metadata has been modified since it was made
available, this must contain the date of the most recent time this occurred.
Element: Type
The nature or genre of the content of the resource.
To describe the physical or digital manifestation of the resource, use
Using SOMA Genres in addition to DCMI Types.
Variation from EBU: EBU uses a greatly extended list of types.
Element: Format
The physical or digital manifestation of the resource.
Use SOMA Media Type for online resources and same as
For online resources, use SOMA Media Type (see Appendix A)
For offline resources, use controlled vocabulary: “offline”.
Using an extended version of their Internet Media Type list.
Variation from EBU: Using an extended version of their Internet Media Type list.
Refinement: Format.Extent
Repeat for indicating extent in seconds and extent in octets (8-bit
Variation from EBU: Duration in seconds (not HHMMSS). Added extent in octets.
Refinement: Format.Medium
The material or physical carrier of the resource.
When media is offline, field contains "offline". When media is online,
Controlled vocabulary: “offline” or “online”.
Element: Identifier
An unambiguous reference to the resource within a given context.
SOMA will use URI of the online resource. If the resource is offline or
unavailable, or if it is necessary to log in to some website to gain access to the resource, a unique URI to a relevant web page should
Variation from EBU: Not following their stated best practice (to use various ID schemes
Element: Language
A language of the intellectual content of the resource.
Repeated where the complete content of the resource may be
understood in several languages. For example, a French film with
English subtitles should have two language elements: French and English.
Each use of the x-lang encoding should be reported to the SOMA
Please use x-lang syntax (as in RFC 1766 [7]) for languages not
covered by ISO-639-2, and report use to the SOMA group. The group will maintain a public list of x-lang strings that have been used.
Element: Relation
Not using some refinements: IsRequiredBy, Requires,
Variation from EBU: Not using some refinements: IsRequiredBy, Requires,
Refinement: Relation.IsVersionOf, Relation.HasVersion
IsVersionOf: The described resource is a version, edition, or
adaptation of the referenced resource. Changes in version imply
substantive changes in content rather than differences in format.
HasVersion: The described resource has a version, edition, or
adaptation, namely, the referenced resource.
Used exclusively for language translations.
URI should ideally link to metadata of the related resource, otherwise
to the related resource or a page about the related resource.
Refinement: Relation.IsReplacedBy, Relation.Replaces
No (but can have IsReplacedBy and Replaces)
IsReplacedBy: The described resource is supplanted, displaced, or
Replaces: The described resource supplants, displaces, or supersedes
URI should ideally link to metadata of the related resource, otherwise
to the related resource or a page about the related resource.
Refinement: Relation.IsPartOf, Relation.HasPart
IsPartOf: The described resource is a physical or logical part of the referenced resource.
HasPart: The described resource includes the referenced resource
URI should ideally link to metadata of the related resource, otherwise
to the related resource or a page about the related resource.
Variation from EBU: Allow text, not just URI.
Refinement: Relation.IsFormatOf, Relation.HasFormat
IsFormatOf: The described resource is the same intellectual content of the referenced resource, but presented in another format.
HasFormat: The described resource pre-existed the referenced
resource, which is essentially the same intellectual content presented in another format.
Use for alternate formats, such as scripts or alternative media types
URI should ideally link to metadata of the related resource, otherwise to the related resource or a page about the related resource.
Element: Coverage
The extent or scope of the content of the resource.
If unqualified, assume the value relates to spatial coverage.
Refinement: Coverage.Spatial
Spatial characteristics of the intellectual content of the resource.
For example, the geographical origin of folk music is placed here.
Text for city names, etc. Use Getty Thesaurus [9] to check spelling.
Not using suggested controlled vocabulary.
Variation from EBU: Not using suggested controlled vocabulary.
Refinement: Coverage.Temporal
Time period(s) to which the intellectual content of the resource pertains.
For example, historical periods discussed in a history programme
Element: Rights
Information about rights held in and over the resource.
By "Rights" we here mean the rights to the programme (sound file
etc.) as a whole. For rights covering parts of the file (music, poetry etc included in the file) keep these in the institutions locally. Here
register for instance NRK, BBC or the production company responsible
for the programme or the record company that owns the rights to a phonogram.
Instead of text, you may use a URI to point to a standardized
If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.
Element: ExtendedInformation
Additional data concerning the resource. May be used to store
proprietary or application or organization specific information.
May be repeated with different schemes. This element should mainly
be used to store information that is not universally applicable but
should still be exchanged with the resource.
This element should be stored unchanged (as blob or similar) if the storing application cannot use it.
Variation from EBU: Does not exist in EBU.
Refinement: ExtendedInformation.Scheme
Yes, if ExtendedInformation element is present
The encoding scheme for the extended information element.
This is a string that uniquely identifies the scheme of the extended information data.
Variation from EBU: Does not exist in EBU.
References
[1] Dublin Core Metadata Element Set http://www.dublincore.org/documents/dces/ and
qualifiers http://dublincore.org/documents/dcmes-qualifiers/
[2] EBU Metadata for Radio Archives http://www.ebu.ch/tech_t3293.pdf
[3] EBU Roles in broadcasting (not published at time of writing)
[4] Dublin Core Metadata Initiative – Period Encoding Scheme
http://dublincore.org/documents/dcmi-period/
[5] W3C Note: Date and Time Formats http://www.w3.org/TR/NOTE-datetime
[6] Dublin Core Metadata Initiative – Type Vocabulary
http://dublincore.org/documents/dcmi-type-vocabulary/
[7] RFC 1766: Tags for the Identification of Languages, Internet Engineering Task Force
[8] ISO 639-2: Codes for the representation of names of languages – Part 2: Alpha-3
code (Registration Authority) http://lcweb.loc.gov/standards/iso639-2/
http://www.getty.edu/research/tools/vocabulary/tgn/
http://www.creativecommons.org/metadata/spec
Appendix A: Vocabularies SOMA Genres Actuality Advert / jingle / spot
Oral history / storytelling Talk show / discussion
SOMA Roles Artist Author
Based on FIU Digital Library’s Metadata Creation Manual:
http://www.fiu.edu/~diglib/metadata/roles.html
SOMA Media Types Real Audio
audio/x-pn-realaudio-plugin audio/rn-realaudio
Ogg Vorbis application/x-ogg
Real Video video/vnd.rn-realvideo
video/x-pn-RealVideo-plugin MPEG video
MPEG-2 video
video/mpeg-2 Macintosh Quicktime
Microsoft Video
video/x-msvideo SMIL SOMA Locations
Hierarchical system set out in the UN standard: http://www.un.org/Depts/unsd/methods/m49regin.htm
The countries of "Tibet", "Kosovo" and "Palestine" will be added to this classification system.
SOMA Topics The top-level categories are intended as a suggestion of how to display the categories in a
manageable form. They don't really imply containment (so things that are about "gender"
are not necessarily about "development"). development environment human rights information & media politics war & peace Appendix B: XML Representation Dublin Core in XML and RDF
Guidelines for implementing Dublin Core in XML: http://dublincore.org/documents/2002/07/23/dc-xml-guidelines/
Expressing Qualified Dublin Core in RDF / XML:
http://dublincore.org/documents/2002/04/14/dcq-rdf-xml/
Expressing Simple Dublin Core in RDF/XML:
http://dublincore.org/documents/2001/11/28/dcmes-xml/
Organisation Names in Several Languages
In RDF/XML you could tie them together using a URI and a RDF Bag – see the example in section 2.2.2 Bag:
http://dublincore.org/documents/2002/04/14/dcq-rdf-xml/
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.amarc.org/"> <dc:creator> <rdf:Bag> <rdf:li
xml:lang="en"> World Association of Community Radio Broadcasters </rdf:li> <rdf:li xml:lang="fr"> Association Mondiale des roadiodiffuseurs communautaires </rdf:li> </rdf:Bag> </dc:creator> </rdf:Description> </rdf:RDF>
Organisation Details using FOAF Files
In RDF/XML you can use the “seeAlso” construct to point to an associated resource,
containing more information about an organisation. For example:
<dc:publisher rdf:seeAlso="uriOfAnotherMetadataFile"> Whoever </dc:publisher>
For publisher, we will allow people to use this construct to point to a Friend Of A Friend
(FOAF) file. For more details, see: http://xmlns.com/foaf/0.1/
Extent in Seconds and Octets
In RDF/XML, it should be represented something like:
<rdf:Description rdf:about="http://example.org/thing"> <dcterms:extent> <foo:Seconds> <rdf:value>123</rdf:value> </foo:Seconds> </dcterms:extent> <dcterms:extent> <foo:Bytes> <rdf:value>123</rdf:value> </foo:Bytes> </dcterms:extent> </rdf:Description>
Or following http://dublincore.org/documents/2002/04/14/dc-xml-guidelines/ something like:
<qualifieddc> <dcterms:extent scheme="Seconds">123</dcterms:extent> <dcterms:extent scheme="Bytes">123</dcterms:extent> </qualifieddc>
Appendix C: About This Document Stakeholders
This document should be reviewed and agreed by the following parties. Only one main contact is given for each organisation.
Name Organisation Reviewing this Document
All outstanding issues will be resolved by 18 October 2002. Contact Suzi Wells
([email protected]) for more information, or with comments or questions.
Versions of the SOMA Metadata
Version numbers for the SOMA Metadata consist of three fields: the major version, the minor
version, and the update version. The differences among the fields are as follows:
Major. Significant changes to the metadata set. Minor. Element additions or more significant changes to existing elements. Update. Any other changes (e.g. small formatting changes, refinements of the use of
IODINE CONTRAST FORM Your Doctor has ordered the following exam which uses Iodine Contrast material: CT IVP HSG T-Tube Cholangiogram Retrograde Pyelogram Cystogram Fistulagram Name: ______________________________________________________________ Account / SS #: _________________ Date of Birth: _______________ Reason for Exam: _________________________________________________________ Have y
EPAR summary for the public This is a summary of the European public assessment report (EPAR) for Ilaris. It explains how the Committee for Medicinal Products for Human Use (CHMP) assessed the medicine to reach its opinion in favour of granting a marketing authorisation and its recommendations on the conditions of use for Ilaris. What is Ilaris? Ilaris is a medicine that contains the acti