Websucker

Agent for Sucking the of Web

Features

Activate virtual environment:

python -m virtualenv ./venv
source ./venv/bin/activate

Install package:

pip install https://git.kemt.fei.tuke.sk/dano/websucker-pip/archive/master.zip

If you have Cassandra installed, first initialize the database schema using the cqlsh command, the schema can be found in the schema.sql file

You set up the database using an environment variable (if it is on another machine):

export CASSANDRA_HOST=localhost

export CASSANDRA_PORT=9142

websuck --help

Save the list of domains to a file, e.g.

echo www.sme.sk > domains.txt

websuck --visit file domains.txt

websuck --visit unvisited 100