RSS 1.0 Validator
A Schematron Schema for RSS 1.0
Introduction
DTD based validation does not provide the kind of
flexibility required in many applications. A DTD is limited in the structures
that it can specify, and cannot validate element content, e.g. to determine
they have the correct length or format. A further limitation is that a
validating parser, when encountering a validation error, typically just
emits an, often cryptic, error message.
Schematron
is a schema language that allows a document to be validated by testing
it against a set of patterns (XPath expressions). Schematron validation
rules allow the author to specify a helpful error message which will be
provided to the user if an error is encountered.
The RSS Validator is a Schematron schema for RSS,
the XML vocabulary used to syndicate web content in applications such as
My.Yahoo,
My.Netscape,
My.Userland
and Meerkat.
The details of the RSS vocabulary can be found
at the RSS Info weblog, and the
RSS
1.0 specification.
User Guide
A Schematron schema is used to generate an XSLT
stylesheet which is then used to apply the validation rules expressed in
the schema to a source document.
If you are interested in further developing the
RSS Validator schema then follow the Developer User
Guide. If you are interested in simply validating some RSS documents
using the schema, then follow the Author User Guide.
Developer User Guide
You will need the following components:
We'llassume for the following examples that you can
invoke your XSLT processor using a batch file or shell script as follows:
xt input-document stylesheet output-document
If you have downloaded the schematron-basic implementation
(and you must have downloaded the skeleton to accompany it), then you can
generate the validating stylesheet as follows:
xt rss_validator.xml sch-basic.xsl validator.xsl
You can then run the validator against an RSS document
as follows:
xt rss_doc.xsl validator.xsl [report.txt]
If you spot any errors in the validator, or add
any new rules then please contact me.
I'd welcome any feedback or contributions.
Author User Guide
You will need the following components:
We'll assume for the following examples that you can
invoke your XSLT processor using a batch file or shell script as follows:
xt input-document stylesheet output-document
You can then download the validator stylesheet
and run it against an RSS document as follows:
xt rss_doc.xsl validator_text.xsl report.txt
The stylesheet produces a plain-text report. However if you want to
have an HTML version of the report, take a look at schematron-report.
Experimental Online Validation Service
An online validation service for RSS 1.0 is
available, using this Schematron schema. The service makes use of an online
XSLT transformation service hosted by the W3C. This service is still an
early experiment, and thus the online RSS 1.0 validator should also be
considered experimental as well.
A more permanent home for the online validator should be announced shortly.
Development Notes
Schema Structure
To make the RSS Schematron Schema manageable, the validation rules have
been separated into individual modules. Each module contains a number of
self-contained rules which apply to a specific aspect of the RSS validaton.
Each module is named with a *.sch suffix, and are imported into the
main Schematron schema using XML entity references. RSS itself consists
of a number of modules, validation code specific to these modules are defined
in files using the following naming convention:
module_id.sch
E.g. Dublin Core validation rules are specified in module_dc.sch
The schema currently includes a rules to check the length of text contained
within the core RSS elements. These field length restrictions are the same
as are applied by the My.Netscape
application, and are mainly required for backwards compatibility reasons.
As these restrictions are likely to be restricted, these validation rules
are defined in a separate schematron file, field_lengths.sch.
Namespace Checking
Each RSS module is identified by a separate namespace. For example the
Dublin Core module namespace is:
http://purl.org/dc/elements/1.1/
The schema will report on elements which are from unknown namespaces
to highlight to the user that these elements have not undergone validation.
Namespace checking is strict and is carried out by the XSLT engine, rather
than by doing string comparisions within the validation rules. In other
words, it is the responsibility of the XSLT engine to determine whether
two namespace URIs compare as equal.
Schematron Implementation
The zip file distribution includes a modified
version of Schematron
Report (by David Carlisle) which uses a modified
version of xmlverbatim
(by Oliver Becker) to include a syntax highlighted version of the original
XML in the final HTML report.
This modified Schematron implementation requires the use of SAXON, as
XT does not properly implement the namespace axis.
Download
Version History
-
1.1, 19th November 2000
-
Added checks to ensure rdf:about uniqueness on channel,
item, image and textinput elements. Added content checks to note that rdf:about
for an image should match its url element, and likewise for textinput and
its link element.
-
Added structural validation checks for the DC module.
-
Tidied up documentation to include some additional
notes on the schema structure, and added link to online validator.
-
Announced to RSS-DEV mailing list for feedback/testing.
-
1.0b, 3rd November 2000
-
Unreleased version. Amended schema to bring it in-line
with the latest revision of RSS 1.0, mainly incorporating the additional
RDF elements for the item 'table of contents'. Added placeholder for the
Dublin Core validation module. Tidied up the namespace handling.
-
1.0, 3rd September 2000
-
Produced initial version of this web page, and posted
first version of the validator..
TODO List
-
Assign identifiers to each potential failure, and
provide a catalogue which fully describes failures and their remedies.
This could possibly involve an extended Schematron implementation which
adds the appropriate cross-references in the validation report.
-
Move away from ad hoc testing and identify a number
of test cases which will rigorously test the schema. This will be increasingly
important as new RSS modules are added.
-
Wrap the validator within a Java servlet for deployment
on the web. (Note the SAXON servlet would probably be sufficient)
-
Deploy the online validator to a 'proper' home, at
a persistent url.
-
Determine whether additional content checks should
be added to the DC module validation
-
Add support for the syndication module
-
Determine status of field checks, and the RSS091 module
-
Add checks for rdf:parseType attributes where these
would be required by an RDF parser
-
Currently date-time formats are not checked. Common
usage suggests these conform to that specified in RFC822.
Potentially a Java extension function could be used to perform this more
efficiently than in XSLT. Its unclear how this would be achieved in Schematron.
References
[Top]
Page Maintained by Leigh
Dodds. Last Updated 19 November 2000