SML Coloring Book. 3rd Edition

This is the 3rd Edition of the SML Coloring Book. Previous editions are still available:

1st Edition - early version
2nd Edition - more complete XHTML based example

The contents of these coloring books are the product of on-going discussions on the SML-DEV mailing list.

In this edition we look at the coloring pipeline. The following document is taken from end to end, with the processing at each step illustrated. This editions sample document was derived from here. Please forgive the lack of a business oriented example, the same concepts still apply.

Example document

<rhyme type="nursery">
<title>Peter Piper Picked A Peck Of Pickled Peppers</title>
<!-- This is taken from Mother Goose -->
<verse>
<line>Peter Piper picked a peck of pickled peppers.</line>
<line>A peck of pickled peppers Peter Piper picked.</line>
<line>If Peter Piper picked a peck of pickled peppers,</line>
<line>Where's the peck of pickled peppers Peter Piper picked?</line>
</verse>
</rhyme>

Parsing

The parsing process in the pipeline is obviously syntax specific: it has to be able to properly parse the document. Because a parser is syntax specific, it is easily able to add coloring information to the data as it is parsed. This is the first point of entry for color into the parsed data. As we'll see later this color information may be supplemented, or rewritten depending on later steps.

The color information in a document is implied by the syntax. The range of colors is defined by the range of different syntactic constructs in the specific language. For example, heres a partial list of colors for SGML (courtesy of Clark Evans):

emm - element with mandatory begin and mandatory end tags
emf - element with mandatory begin and forbidden end tags
eoo - element with optional begin and optional end tags
emo - element with mandatory begin and optional end tags
av - attribute with value
an - attribute without value

Heres some sample colors for XML (not intended to be complete)

element - an element
attribute - an attribute
cdata - cdata
comment - comment

Its important to note here that the SML data model requires a node to have a name. However certain syntactic constructs, such as the SGML/XML comment do not have a name. The parser must therefore add names to these un-named nodes. One current suggestion is to reserve the SML namespace, and use qualified names to identify these nodes, e.g. sml:comment, sml:text.

So what does the sample document look like when parsed?. Firstly let me introduce this syntax which has been used on the list, its useful as it gets away from the angle brackets of XML, and avoids thinking about elements and attributes, just the data items.

Note that the following is NOT the SML serialisation syntax. Its just a means of expressing the data model which doesn't presume any particular language or syntax: its just meant to capture the internal state of the pipeline.

The syntax is defined as follows:

For nodes with values : {"element-name", "color", "value"}
For nodes with children : {"element-name", "color", [ ...child definitions... ]}

A node cannot have both a value and children. This is a key part of the SML model.

I've picked out the different colors in the HTML to make them explicit. Elements are dark blue, attributes are green, and comments are in orange.

Example Document (Parsed and Colored)


{"rhyme", "element", ""
   [
      {"type", "attribute",
         [
            {"sml:text", "text", "nursery"}
         ]
      }
      {"title", "element",
         [
            {"sml:text", "text", "Peter Piper Picked A Peck Of Pickled Peppers"}
         ]
      }
      {"sml:comment", "comment",
         [
            {"sml:text", "text", "This is taken from Mother Goose"}
         ]
      }
      {"verse", "element",
         [
            {"line", "element",
               [
                  {"sml:text", "text", "Peter Piper picked a peck of pickled peppers."}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "A peck of pickled peppers Peter Piper picked."}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "If Peter Piper picked a peck of pickled peppers,"}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "Where's the peck of pickled peppers Peter Piper picked?"}
               ]
            }

         ]
      }
   ]
}

Processing

The processing layer is essentially your application. It can do what it likes here. The application can be 'color-blind', i.e. just work on the data model without using the coloring information, or it can use the color if it wishes.

Lets assume for the purposes of this example that our application is an editor. We're editing the document to add the source of the rhyme, which was previously noted in a comment. Heres the newly edited document.


{"rhyme", "element", ""
   [
      {"type", "attribute",
         [
            {"sml:text", "text", "nursery"}
         ]
      }
      {"title", "element",
         [
            {"sml:text", "text", "Peter Piper Picked A Peck Of Pickled Peppers"}
         ]
      }
      {"source", "",
         [
            {"sml:text", "text", "Mother Goose"}
         ]
      }
      {"verse", "element",
         [
            {"line", "element",
               [
                  {"sml:text", "text", "Peter Piper picked a peck of pickled peppers."}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "A peck of pickled peppers Peter Piper picked."}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "If Peter Piper picked a peck of pickled peppers,"}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "Where's the peck of pickled peppers Peter Piper picked?"}
               ]
            }

         ]
      }
   ]
}

You'll notice that the orange comment has been removed, and a new node added. This node has the name 'source', and as you can see does not have any color at present. This will be added in the Painting stage.

Validation

The validation layer is optional, just as it is with XML. However the advantages of the SML pipeline are that it is possible to separate out validation into two types:

Schema validation
Syntax validation

The first type, schema validation, validates the data in terms of its document type. e.g. for HTML this might be to ensure that an 'a' element only has a single 'href' child. For our example, we might need to confirm that there a node called 'source' is allowed.

Schema validation is color-blind. It doesn't need to know whether the href child will ultimately be an attribute or an element, it just needs to validate that the child is present, and there is at most one. Express your schema in-terms of the color-blind SML model, and this schema can be applied across many syntaxes.

The second form of validation, syntax validation, cannot be color-blind. Syntax validation has to ensure that the data conforms to the syntactic constraints of the particular language. i.e. SGML for HTML, XML for XHTML. An example of a syntactic constraint in XML is that an element may only have a single attribute with a given name. Syntactic validation must be carried out after coloring.

Note : There are work items here to define the schema language(s) for both schema and syntactic validation, and also providing a utility or mechanism to map from existing document types into these languages.

Painting

Painting is the process of adding color to the data, and is the first step towards serialisation. As we noted at the parsing stage, coloring is syntax specific. Therefore we must tell the Painter, which is a generic component, how to add color to the model.

Such a 'Color Scheme' provides the glue between the data from the application and the Printer (which is the next stage). The Painter has no involvement in validation, although the data model which its processes should ideally conform to the document schema. Oren Ben-Kiki summarised this quite nicely :

"...the painter accepts a data model which satisfies the language schema and paints it so that the colored data model satisfies the syntax schema."

The Painter will apply the Color Scheme by using the names of the nodes in the data. For each document type there will have to be a Color Scheme supplied for each target syntax. e.g. if we wanted to serialise our RhymingML as both XML and SGML then we may have to supply two different Color Schemes: RhymingML-XML-Coloring, and Rhyming-SGML-Coloring.

Note : XML is a subset of SGML - so can its color schemes be a subset of the SGML schemes?

Note : We have another work item here to define the exact format for defining Color Schemes.

For the purposes of this example, I've assumed that a Color Scheme is expressed as follows:

<!ELEMENT colorScheme (paint)*>
<!ELEMENT paint		 (name, color)>
<!ELEMENT name		 (#PCDATA)>
<!ELEMENT color		 (#PCDATA)>

Heres an XML based color scheme for our RhymingML, again I've applied color to make the different items stand out:

<colorScheme>
<paint><name>rhyme</name><color>element</color></paint>
<paint><name>type</name><color>attribute</color></paint>
<paint><name>source</name><color>attribute</color></paint>
<paint><name>title</name><color>element</color></paint>
<paint><name>verse</name><color>element</color></paint>
<paint><name>line</name><color>element</color></paint>
</colorScheme>

The Painter can now take the data we produced from the editing step, and color it appropriately:


{"rhyme", "element", ""
   [
      {"type", "attribute",
         [
            {"sml:text", "text", "nursery"}
         ]
      }
      {"title", "element",
         [
            {"sml:text", "text", "Peter Piper Picked A Peck Of Pickled Peppers"}
         ]
      }
      {"source", "attribute",
         [
            {"sml:text", "text", "Mother Goose"}
         ]
      }
      {"verse", "element",
         [
            {"line", "element",
               [
                  {"sml:text", "text", "Peter Piper picked a peck of pickled peppers."}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "A peck of pickled peppers Peter Piper picked."}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "If Peter Piper picked a peck of pickled peppers,"}
               ]
            }
            {"line", "element",
               [
                  {"sml:text", "text", "Where's the peck of pickled peppers Peter Piper picked?"}
               ]
            }

         ]
      }
   ]
}

You can see that the 'source' node that we introduced has been properly colored according to the schema. Prior to this it was color-less.

Printing

Its the job of the Printer to take a properly colored data model and serialise it. The Printer is the complement of a Parser and also syntax specific. Therefore there will be XMLPrinters and SGMLPrinters, etc. A Printer should only rely on coloring information to decide how to produce the serialised syntax, it knows nothing of the specific document type.

Heres the output of the above, correctly colored data model after its been fed through an XMLPrinter. Again the coloring information is picked out, here demonstrating how the colored is implied data within the syntax.


<rhyme type="nursery" source="Mother Goose">
<title>Peter Piper Picked A Peck Of Pickled Peppers</title>
<verse>
<line>Peter Piper picked a peck of pickled peppers.</line>
<line>A peck of pickled peppers Peter Piper picked.</line>
<line>If Peter Piper picked a peck of pickled peppers,</line>
<line>Where's the peck of pickled peppers Peter Piper picked?</line>
</verse>
</rhyme>