Slug: Install and Run

Installation

Installation is quite straight-forward. All required libraries and code are provided in the distribution for each release, so simply:

  1. Download the desired release. Ensure that you save the file locally as some users have reported problems opening the file directly in a browser.
  2. Unzip the distribution, e.g. in your home directory. You'll end up with a new sub-directory slug containing all code and files.
  3. You may want to create a new environment variable $SLUG_HOME or %SLUG_HOME% to refer to the location of the scutter. You'll probably also want to add this directory to your PATH.
  4. Ensure you have Java 1.5 installed, you'll need this to run the scutter
  5. It's also recommended that you install Apache Ant. You'll definitely need this if you want to contribute to the project, locally customise the code, or build it yourself. The provided Ant script ($SLUG_HOME/build.xml) also includes a few helpful tools such as building the javadocs, etc. See the tools documentation for notes on that.
  6. That's it, you're ready to configure and run a scutter

Running the Scutter

The Slug distribution includes shell scripts for running a scutter. Run $SLUG_HOME/slug.sh or slug.bat depending on your platform. These scripts configure the required Java CLASSPATH.

These scripts accept the following parameters. The majority are required:

ParameterPurposeRequired?
-configPath to a Slug configuration file Yes
-idIdentifier for scutter profile as defined in the above config. fileYes
-planPath to a "scutter plan". i.e an RDF document identifying the list of initial URLs to be crawled. The distribution includes a simple example, sample-plan.rdfNo, supply this or -freshen, or both
-freshenIndicates whether the scutter should add all previously found URLs to its initial crawler plan. Used to "freshen" already discovered data No, supply this or -plan, or both


Image courtesy of Elroy Serrao.