This page contains some notes on how to configure the Slug crawler.
Slug requires a configuration file in order to configure a number of settings that describe how the crawler will operate. Collectively these settings are known as a profile.
These settings include details such as:
The Slug distribution includes a sample config file config.rdf
that
demonstrates how to configure all of the current components.
The configuration file is expressed as RDF/XML. A given configuration file
may contain entries for more than one profile.
Therefore when running the scutter one must provide the identifier of a
Scutter described in the configuration. This is specified with the -id
parameter, see Running the Scutter.
The complete schema for the Scutter configuration is available in
etc/schema/config.rdfs
in the distribution. It is also
available online
The namespace URI is http://purl.org/NET/schemas/slug/config/
.
The preferred namespace prefix is slug
.
The following sections describe some of the key classes and relationships.
The slug:Scutter
class describes an individual crawler. A given
configuration file may describe more than one crawler.
For now see config.rdf for example configurations.