This section briefly summarizes the requirements for the syndication data
format. Most of these requirements are based on our pre-formed assumptions: that
the we will use XML to encode the message, and that the data encoded in the
syndication message (independent of the content) will be sufficient to support
the use cases described in the use case document.
- The format MUST be XML. Well, most semantically-rich Web data will
be written in XML, so this makes sense.
- The first version SHOULD be as simple as possible, but no simpler.
The syndication language should define the simplest possible grammar and
semantic rules, and should use the minimum possible set of tags and
attributes. This is because we don't really know how this language will
evolve, and we don't want to build-in features that are not needed, and that
block future evolution of the language.
- The language design SHOULD be extensible. As much as possible, the
language syntax should allow for likely extensions, but in a way that
is as backward-compatible as possible. We would like, for example, that
first-generation processors can intelligently process second-generation
documents.
- The language design SHOULD be human-readable. We want the syntax to
be easily understandable, so that it can be widely understood and adopted.
- The language design MAY use the RDF XML syntax. This is to be
determined, but initially we will mock up the design using RDF-free syntax,
just to think things through. In any event, even if the final language does
not use RDF, we MUST define an RDF-based schema document that defines the
semantics of syndication language.
- The language design MAY use XML namespace declarations. Even if the
language itself does not use namespace declarations, such declarations may be
used to qualify names for XML content included in a syndicated message.
This section describes technical requirements of the syndicated message
format -- the types of data the format must be able to encode, and how it must
expose information to the user of the document. These are pretty rough (and not
terribly well written) but should give a pretty fair idea of the intention.
- The message syntax MUST support multiple data parts within each message
(so that a single message can send more than one piece of syndicated data)
- The message syntax MUST support the same metadata set for each data part
in the message (to make it easy to manage syndicated data parts)
- Note that this means that a syndication message can contain a
syndication message as a message part, leading to a hierarchical message
store. In this case, the specification MUST define how metadata properties
are inherited by a syndication message that is inside another syndication
message.
- Each message part MUST identify the MIME type, encoding type (if any) and
(optionally) human-readable language (if any) of the data content.
- It MUST be possible to identify each message part via a URI that, when
dereferenced, returns a 'most recent' version of the message, as defined by
the server delivering the data.
- It SHOULD be possible to identify each message part by a URI that, when
dereferenced, returns precisely the same data as that found in the message
containing this URI. This is then a URI that is equivalent to a globally
unique identifier for that message part.
- It SHOULD be possible to identify each message part by a globally unique
identifier that is unique for any unique instance of a message. [this
should this be a UUID independent of where the data comes from....]
- Each message part SHOULD be able to identify any web-accessible API to the
'service' (if such a service exists) that provides access to the data
repository from which this message was created.
- Each message part SHOULD be able to identify the legal organization that
is the source of the message part.
- Each message part SHOULD be able to identify the legal organization that
is the owner of the data content.
- Each message part SHOULD be able to indicate the date at which the data
content was created
- Each message part SHOULD be able to indicate the date at which the data
content was last-modified
- Each message part SHOULD be able to identify the 'owner' or creator of the
data content
- Each message part SHOULD be able to identify copyright and usage
restrictions that apply to the data content
- Each message part SHOULD be able to provide PICS ratings relevant to the
data content.
- Each message part SHOULD be able to identify human-readable contact
information (e.g., email address) relevant to data content
- Each message part SHOULD be able to provide a text-only summary relevant
relevant to the data content. Multiple human-language summaries for each data
content part SHOULD be supported.
- Each message part SHOULD be able to provide keyword index data relevant to
the data content, and should also reference a resource descrdibing the meaning
of the keyword data Multiple human-language summaries for each data content
part SHOULD be supported.
- Each message part SHOULD be able to provide category index data relevant
to the data content, and should also reference a resource descrdibing the
meaning of the category data Multiple human-language summaries for each data
content part should be supported.