Slug: A Semantic Web Crawler

Slug is a web crawler (or Scutter) designed for harvesting semantic web content. Implemented in Java using the Jena API, Slug provides a configurable, modular framework that allows a great degree of flexibility in configuring the retrieval, processing and storage of harvested content.

The framework provides an RDF vocabulary for describing crawler configurations and collects metadata concerning crawling activity. Crawler metadata allows for reporting and analysis of crawling progress, as well as more efficient retrieval through the storage of HTTP caching data.

Documentation Index

Download the latest release
Read how to Install and Run the Scutter
Slug configuration files and options
Slug supports profiles to allow for different crawling strategies
See how the scutter organizes it's memory
Read how to use Slug to carry out some specific tasks
Consult the API documentation for additional notes on each component.
Planned future developments are outlined in the roadmap

Image courtesy of Elroy Serrao.