RSS Validator
A Schematron Schema for RSS
Introduction
DTD based validation does not provide the kind of
flexibility required in many applications. A DTD is limited in the structures
that it can specify and cannot validate the contents of elements, e.g.
to determine they have the correct length or format. A further limitation
is that a validating parser, when encountering a validation error, typically
just emits an often cryptic error message.
Schematron
is a schema language that allows a document to be validated by testing
it against a set of patterns (XPath expressions). Schematron validation
rules allow the author to specify a helpful error message which will be
provided to the user if an error is encountered.
The RSS Validator is a Schematron schema for RSS
('Rich Site Summary') language used to syndicate web content in applications
such as My.Yahoo, My.Netscape,
My.Userland
and Meerkat.
The RSS Validator schema provides all the power
of validating against the RSS
0.91 DTD, along with a much richer set of validation rules. These validation
rules encompass all the constraints
which cannot be expressed in the RSS DTD.
User Guide
A Schematron schema is used to generate an XSLT
stylesheet which is used to actually apply the validation rules to the
input document.
If you are interested in further developing the
RSS Validator schema then follow the Developer User
Guide. If you are interested in simply validating some RSS documents
using the schema, then follow the Author User Guide.
Developer User Guide
You will need the following components:
We'll assume for the following examples that you can
invoke your XSLT processor using a batch file or shell script as follows:
xt input-document stylesheet output-document
If you have downloaded the schematron-basic implementation
(and you must have downloaded the skeleton to accompany it), then you can
generate the validating stylesheet as follows:
xt rss_validator.xml sch-basic.xsl validator.xsl
You can then run the validator against an RSS document
as follows:
xt rss_doc.xsl validator.xsl [report.txt]
If you spot any errors in the validator, or add
any new rules then please contact me.
I'd welcome any feedback or contributions.
Author User Guide
You will need the following components:
We'll assume for the following examples that you can
invoke your XSLT processor using a batch file or shell script as follows:
xt input-document stylesheet output-document
You can then download the validator stylesheet
and run it against an RSS document as follows:
xt rss_doc.xsl validator_text.xsl report.txt
The stylesheet produces a plain-text report. However if you want to
have an HTML version of the report, take a look at schematron-report.
Download
The current version of the RSS Validator schema is
1.0.
Version History
-
1.0 Final 13th July 2000
-
Tided up line lengths to make the schema a bit more
readable. Amended message text of for some assert elements to better fit
the intended use. Tidied up structure to use clearer groupings. Altered
reference for field lengths to point to Dave Winers document [2] rather
than the Netscape [3] instructions. Added check to ensure that item elements
only occur within channel elements
-
1.0 beta 4th
July 2000
-
Added some additional contents.
-
1.0 alpha, 2nd July 2000
-
Added comments, to do list, references. Altered day
element validation to explicitly check for named days as described in Dave
Winers reference.
TODO List
-
Assign identifiers to each potential failure, and
provide a catalogue which fully describes failures and their remedies.
-
Currently date-time formats are not checked. Common
usage suggests these conform to that specified in RFC822.
Potentially a Java extension function could be used to perform this more
efficiently than in XSLT. Its unclear how this would be achieved in Schematron.
-
Confirm whether the field length limitations are actually
common usage. For example, restricting URL lengths seems unnecessary. However
there is some obvious value, if an RSS feed is going to be stored in a
database.
References
The following web pages proved extremely useful whilst
writing this schema:
[Top]
Page Maintained by Leigh
Dodds. Last Updated 13th July 2000