Syndication Language -- Some issues and possible resolutions

The message has to start with some sort of root element, so I've chosen the name syndicationMessage -- which seems to pretty well summarize what we're talking about. This element contains a syndicated data message, and consequently can contains five broadly-defined types of information:

metadata about the service that aggregated the message
metadata relevant to the syndication process
metadata about each piece of data content in the message. This will have both human language-independent and language-dependent parts.
metadata about possible relationships between the different data content parts in the message
the data content parts themselves

This document describes the specific types of information that are most desired.

Note that the information provided should only be that sufficient to manage a collection of syndicated data and/or data feeds, and nothing else. To put it in other words, the information encoded in a syndication message should not provide any information about the data content of the message, or the service that provided/delivered the data, that is not needed to manage the syndication process. Additional information required to fullfill those roles would be either

Contained inside metadata markup and namespace qualified to define its special role
Expressed using RDF, again with appropriate namespace qualifiers.

1. Metadata about the Aggregator

The following information about the aggregator (source of the message) is often required by the consumer.

Creation date. the date/time at which the message was assembled for delivery. Note that this can be different from the actual time/date the consumer receives the message!.
Resource URI. the URI from which the message was or can be retrieved. This URI may reference different variants of the resource, depdending on the time the request is made. Thus URI is on its own sufficient to define a resource -- no additional data (e.g. POSTed form data) is needed.
Unique identifier. a unique identifier for the resource, different from any other possible resource or any other possible version of this resource. This lets the consumer distinguish between different variants (version, etc.) of the resource.
Unique resource URI. A URI that explicitly references this unique resource: Defereferencing this URI would retrieve the same data message, with the same unique identifier. Thus URI is on its own sufficient to define this resource -- no additional data (e.g., POST'ed data) is needed.
Identity of aggregator. The name and other descriptive information about the aggregator.
Aggregator contact URI. A URI (maybe a mailto: URL) for contacting the aggregator.
Aggregator information URI. A URI for obtaining further information about the aggregator.
Aggregator interface. A reference and brief desccription to any Web-accessible softwaer interface that can be used to access the aggregator's services. This information could include:
- URI to interface. A URI referencing the interface.
- Interface type. The type of the interface (XML-RPC, SOAP, etc.).
- Informational URI. A URI referencing a resource providing a description of the interface.

2. Metadata and the Syndication Process

The syndication process itself needs certain pieces of information so that it knows how and when it can use the data content of the message. The following properties summarize the 'lowest common denominator' information typically needed to manage syndicated data:

Start date. The date and time after which the consumer is free to use the data content.
Stop date. The date and time after which the consumer should stop using the data content.
Update interval. The time interval after which the aggregator is likely to have produced an 'updated' version of the data content.
Distribution rules. Each aggregator may have a more detailed set of distribution rules that need to be understood. This set of rules could either be included in the mesage, or referenced (via a URI) from the message.

3a. Language-Independent Metadata for the Data Content

The syndication process will also need to know certain things about the data content, and/or the creator of the data content (which may be different than the aggregator). This information would include:

Creation date. When the data was created.
Last-modified date. When the data was last modified.
PICS rating. Content rating information, if any.
Creator info. Information about the person, organization or other who is the creater/owner of the data, including name, URI reference, contact info, etc.
Copyright info. Copyright information relevant to the data.
Human language. The human readable language of the data.

3b. Language-Dependent Metadata for the Data Content

The syndication process also needs to know certain language-dependent things about the data content. The following items are the 'lowest common denominator' items that are common to most current syndication processes:

Human language. The human language for each collection of language-dependent metadata.
Title. A title to use for the data content.
Description. A brief text-only description of the resource
Keywords. A set of keywords/key phrases relevant to the data content.
Catetories. A set of hierarchical category specifications relevant to the data content.

Each message could have more than one set of language-dependent metadata, so that the syndication tool could appropriately manage the data in systems using different human languages.

4. Relational Metadata

It is easy to imagine a message containing related pieces of data: for example a Web page and the associated image files, a collection of XML documents representing different language variants of the same text content, or a sequence of news items related to the same general topic. It would be useful if the syndication mechanism allows for an easy way of including such relational information.

There are several efforts underway to provide a framework for such metadata (site maps, etc.). These all generally use RDF (expressed in XML) to encode the information. There is also no set of common, simple relationships that could easily be chosen for a 'basic' syndication format. It may thus make sense to define a syndication message format that 'allows' for such metadata, but that does not define any specific mechanism for providing the relational metadata.

5. The Data Itself

We need to know a fair bit about the data content itself. Such as

Data type. What is the type of the data (MIME type)
Encoding. If the data is encoded/compressed, how is this done (gzip with base64, base64 only, etc.)
Primary type. If the data is XML, what is the 'main' data type for it (XHTML, MathML, VoxML, etc -- the thing could have many different namespace-declared types in it, but which one is 'most' important)
Namespace declarations. If the data is bare XML, then we;ll need to know the namespace prefixes used in it, so that the data can be successfully processed.

Design Principles

This URL: http://www.iangraham.org/projects/news/issues-0.html

Created: 24 September, 2000
Last Update: 2 October, 2000

Author(s): Ian Graham

Overall Message Design

1. Metadata about the Aggregator

2. Metadata and the Syndication Process

3a. Language-Independent Metadata for the Data Content

3b. Language-Dependent Metadata for the Data Content

4. Relational Metadata

5. The Data Itself

Questions

Design Principles

This URL: http://www.iangraham.org/projects/news/issues-0.html

Created: 24 September, 2000 Last Update: 2 October, 2000

Author(s): Ian Graham

Overall Message Design

1. Metadata about the Aggregator

2. Metadata and the Syndication Process

3a. Language-Independent Metadata for the Data Content

3b. Language-Dependent Metadata for the Data Content

4. Relational Metadata

5. The Data Itself

Questions

Created: 24 September, 2000
Last Update: 2 October, 2000