The message has to start with some sort of root element, so I've chosen the
name syndicationMessage -- which seems to pretty well summarize what
we're talking about. This element contains a syndicated data message, and
consequently can contains five broadly-defined types of information:
- metadata about the service that aggregated the message
- metadata relevant to the syndication process
- metadata about each piece of data content in the message. This will have
both human language-independent and language-dependent parts.
- metadata about possible relationships between the different data content
parts in the message
- the data content parts themselves
This document describes the specific types of information that are most
desired.
Note that the information provided should only be that sufficient to manage a
collection of syndicated data and/or data feeds, and nothing else. To put it in
other words, the information encoded in a syndication message should not provide
any information about the data content of the message, or the service that
provided/delivered the data, that is not needed to manage the syndication
process. Additional information required to fullfill those roles would be either
- Contained inside metadata markup and namespace qualified to define its
special role
- Expressed using RDF, again with appropriate namespace qualifiers.
1. Metadata about the Aggregator
The following information about the aggregator (source of the message) is
often required by the consumer.
- Creation date. the date/time at which the message was assembled for
delivery. Note that this can be different from the actual time/date the
consumer receives the message!.
- Resource URI. the URI from which the message was or can be
retrieved. This URI may reference different variants of the resource,
depdending on the time the request is made. Thus URI is on its own sufficient
to define a resource -- no additional data (e.g. POSTed form data) is needed.
- Unique identifier. a unique identifier for the resource, different
from any other possible resource or any other possible version of this
resource. This lets the consumer distinguish between different variants
(version, etc.) of the resource.
- Unique resource URI. A URI that explicitly references this unique
resource: Defereferencing this URI would retrieve the same data message, with
the same unique identifier. Thus URI is on its own sufficient to define this
resource -- no additional data (e.g., POST'ed data) is needed.
- Identity of aggregator. The name and other descriptive information
about the aggregator.
- Aggregator contact URI. A URI (maybe a mailto: URL) for contacting
the aggregator.
- Aggregator information URI. A URI for obtaining further information
about the aggregator.
- Aggregator interface. A reference and brief desccription to any
Web-accessible softwaer interface that can be used to access the aggregator's
services. This information could include:
- URI to interface. A URI referencing the interface.
- Interface type. The type of the interface (XML-RPC, SOAP, etc.).
- Informational URI. A URI referencing a resource providing a
description of the interface.
2. Metadata and the Syndication Process
The syndication process itself needs certain pieces of information so that it
knows how and when it can use the data content of the message. The following
properties summarize the 'lowest common denominator' information typically
needed to manage syndicated data:
- Start date. The date and time after which the consumer is free to
use the data content.
- Stop date. The date and time after which the consumer should stop
using the data content.
- Update interval. The time interval after which the aggregator is
likely to have produced an 'updated' version of the data content.
- Distribution rules. Each aggregator may have a more detailed set of
distribution rules that need to be understood. This set of rules could either
be included in the mesage, or referenced (via a URI) from the message.
3a. Language-Independent Metadata for the Data Content
The syndication process will also need to know certain things about the data
content, and/or the creator of the data content (which may be different than the
aggregator). This information would include:
- Creation date. When the data was created.
- Last-modified date. When the data was last modified.
- PICS rating. Content rating information, if any.
- Creator info. Information about the person, organization or other
who is the creater/owner of the data, including name, URI reference, contact
info, etc.
- Copyright info. Copyright information relevant to the data.
- Human language. The human readable language of the data.
3b. Language-Dependent Metadata for the Data Content
The syndication process also needs to know certain language-dependent things
about the data content. The following items are the 'lowest common denominator'
items that are common to most current syndication processes:
- Human language. The human language for each collection of
language-dependent metadata.
- Title. A title to use for the data content.
- Description. A brief text-only description of the resource
- Keywords. A set of keywords/key phrases relevant to the data
content.
- Catetories. A set of hierarchical category specifications relevant
to the data content.
Each message could have more than one set of language-dependent metadata, so
that the syndication tool could appropriately manage the data in systems using
different human languages.
4. Relational Metadata
It is easy to imagine a message containing related pieces of data: for
example a Web page and the associated image files, a collection of XML documents
representing different language variants of the same text content, or a sequence
of news items related to the same general topic. It would be useful if the
syndication mechanism allows for an easy way of including such relational
information.
There are several efforts underway to provide a framework for such metadata
(site maps, etc.). These all generally use RDF (expressed in XML) to encode the
information. There is also no set of common, simple relationships that could
easily be chosen for a 'basic' syndication format. It may thus make sense to
define a syndication message format that 'allows' for such metadata, but that
does not define any specific mechanism for providing the relational metadata.
5. The Data Itself
We need to know a fair bit about the data content itself. Such as
- Data type. What is the type of the data (MIME type)
- Encoding. If the data is encoded/compressed, how is this done (gzip
with base64, base64 only, etc.)
- Primary type. If the data is XML, what is the 'main' data type for
it (XHTML, MathML, VoxML, etc -- the thing could have many different
namespace-declared types in it, but which one is 'most' important)
- Namespace declarations. If the data is bare XML, then we;ll need to
know the namespace prefixes used in it, so that the data can be successfully
processed.