Syndication Language -- Some issues and possible resolutions

In this model, we assume that the data content of a syndicationMessage is inside a data element. However, the model assumes that the software reading a syndicated data message can succesfully process it without any knowledge of the content inside data. This means that the data element attributes, or the metadata content of the parent syndicationMessage element, must provide sufficient information about data that the processor can determine how to handle that content (e.g., ignore it, unpack/decode it, direct it to another application for processing, etc.). For this to be possible, the consumer must know at least the following things about the data content:

It's MIME type
The compression algorithm (if any) used to compress the data prior to including it in the message
The encoding algorithm (if any) used to encode the data (compressed or not) before including it as content
If the content is XML, the primary namespace associated with the content.
If the data content is unencoded, uncompressed XML, zero or more namespace declarations defining namespace prefixes used by content inside data.

Note that item 4 is not the same thing as an XML namespace declaration, but is instead really a type identifier for identifying subtypes of XML data. This is needed because the content of the element may be compressed or encoded XML data that the syndication system knows nothing about, other than that it is consistent with a specific namespace-defined XML dialect. Thus the syndication system simply uses this namespace identifier to determine where the decoded, decompressed content of the data element should be sent.

What about 'real' XML inside data?

Well, that too is possible. In that case, however, the processing tool assembling the mesage would need to ensure that markup inside data is appropriately namespace-qualified, and would need to add appropriate namespace prefix declarations to the data element. Note that the primary namespace identifier (4) is still useful, however, since it lets the application know the primary content of the element.

Note that the application assembling the message would need to ensure that the charset for the the data content is the same as that of the overall synchronized data message. This may mean doing a charset-to-charset transformation of the original data content.

Markup Model

It consequently makes sense to define the following attributes for the data element:

type -- the MIME type of the (possibly decoded and decompressed) element content. This attribute is mandatory.
content-encoding -- the mechanism used to encode the data content. Possible values would be "gzip", "compress", "zlib" as described in Section 3.5 of the HTTP 1.1 specfication (RFC 2616). Note that, if data is content-encoded, it must also be content-escaped. This attribute is mandatory only if the content is compressed/encoded in some way.
content-escaping -- the escaping mechanism used to make non-XML text data "safe" as PCDATA content. Likely values would be "base64" and "uuencode" for binary (or other) data. This attribute would be mandatory if the data is content encoded via a mechanism yielding characters disallowed in XML element content.
An alternate value might be "cdata", which indicates that the content is text data wrapped in a CDATA section. IN this case, the data must be preprocessed to 'escape' the sequences ]]> to be ]]> and the sequence ]]& to be ]]&.
namespace -- the 'primary' XML namespace relevant to the content. This is relevant only if the content is XML data (either raw or encoded), for which the content-type values will be "text/xml" or "application/xml". However, this would also be relevant to any future XML data types that have their own content-type.
xmlns:xxx -- Zero or more namespace declarations relevant to the content of the document. These attributes are relevant only if the content is well-formed XML and if the content has been modified to appropriately qualify element and attribute names. This attribute is required only if the data element content contains unencoded XML.

Here are some simple examples illustrating how this would work in the real world.

1. Contains unescaped PostScript data

<data content-type="application/postscript">
%!PS-Adobe-2.0
%%BeginProlog
%%BeginResource ShowcaseResource
1 setlinejoin
....
</data>

2. Contains HTML content

<data content-type="text/html;version=4"
   content-encoding="gzip"
   content-escaping="base64">
PlGODlhGAGgAPEAAP/////ZRaCgoAZZDCH+PUNvcHlyaWdodCAoQykgMTk5q
AQDVK32yilpdjladlsfg1116ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
and so on -- BASE64 encoded gzipped data.......
</data>

3. Contains 'raw' HTML content

<data content-type="text/html;version=4"
   content-escaping="cdata">
<[CDATA[
   <div align="right">
     <img src="image.gif" title="Hi Mommy!" >
     <p>This is a paragraph of useless text .... 
     <p>Here is another </p>
   </div> ]]>
</data>

4. Contains XML data

<data content-type="text/xml"
      namespace="http://www.heml.org/schemas/heml1.0"
      xmlns:h="http://www.heml.org/schemas/html1.0"
      xmlns:dc="dublin core URL">
  <h:event id="Att1915">
    <h:label xml:lang="en">Attack on Gallipoli</h:label>
    <h:date h:calendar="Gregorian" h:era="AD" xml:lang="en">
      <h:year>1915</h:year>
      <h:month>08</h:month>
      <h:day>9</h:day>
    </h:date>
    <h:location>Gallipoli Peninsula</h:location>
    <h:origin xml:link="http://www.lib.byu.edu/~rdh/wwi/1915/gallpoli.html"/>
  </h:event>
</data>

5. Contains Encoded XML Data

<data content-type="text/xml"
     namespace="http://www.heml.org/schemas/heml1.0"
     content-encoding="gzip"
     content-escaping="base64">
PlGODlhGAGgAPEAAP/////ZRaCgoAZZDCH+PUNvcHlyaWdodCAoQykgMTk5q
AQDVK32yilpdjladlsfg1116ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
and so on -- BASE64 encoded gzipped data, in this case XML..
</data>

Packaging Data Content

This URL: http://www.iangraham.org/projects/news/issues-3.html

Created: 28 August, 2000
Last Update: 2 October, 2000

Author(s): Ian Graham

Data Element and Content

What about 'real' XML inside data?

Other Text Content (HTML, SGML or raw text) Inside data?

Markup Model

Data Element Content Examples

Packaging Data Content

This URL: http://www.iangraham.org/projects/news/issues-3.html

Created: 28 August, 2000 Last Update: 2 October, 2000

Author(s): Ian Graham

Data Element and Content

What about 'real' XML inside data?

Other Text Content (HTML, SGML or raw text) Inside data?

Markup Model

Data Element Content Examples

Created: 28 August, 2000
Last Update: 2 October, 2000