Installation is quite straight-forward. All required libraries and code are provided in the distribution for each release, so simply:
slug
containing all code and files.$SLUG_HOME
or %SLUG_HOME%
to refer to the location of the scutter. You'll
probably also want to add this directory to your PATH.$SLUG_HOME/build.xml
)
also includes a few helpful tools such as building the javadocs, etc. See the
tools documentation for notes on that.
The Slug distribution includes shell scripts for running a scutter. Run $SLUG_HOME/slug.sh
or slug.bat
depending on your platform. These scripts configure the required
Java CLASSPATH
.
These scripts accept the following parameters. The majority are required:
Parameter | Purpose | Required? |
---|---|---|
-config | Path to a Slug configuration file | Yes |
-id | Identifier for scutter profile as defined in the above config. file | Yes |
-plan | Path to a "scutter plan". i.e an RDF document
identifying the list of initial URLs to be crawled. The distribution includes a
simple example, sample-plan.rdf | No, supply this or -freshen , or both |
-freshen | Indicates whether the scutter should add all previously found URLs to its initial crawler plan. Used to "freshen" already discovered data | No, supply this or -plan , or both |