Websucker Crawler Agent
Go to file
2020-05-20 09:22:19 +02:00
websucker zz 2020-05-20 09:22:19 +02:00
.gitignore zz 2020-05-10 11:54:53 +02:00
Dockerfile zz 2020-05-11 16:43:39 +02:00
LICENSE.txt initial 2020-05-07 16:09:45 +02:00
MANIFEST.in initial 2020-05-07 16:09:45 +02:00
README.md zz 2020-05-13 15:20:20 +02:00
requirements.txt zz 2020-05-10 11:48:17 +02:00
setup.py zz 2020-05-10 11:48:17 +02:00

Websucker

Agent for Sucking the of Web

Features

  • Crawling of best domains
  • Crawling of unvisited domains
  • Text mining
  • Evaluation of domains
  • Daily report
  • Database Summary

Requirements

  • Python 3
  • running Cassandra 3.11
  • optional Beanstalkd for work queue

Installation

Activate virtual environment:

python -m virtualenv ./venv
source ./venv/bin/activate

Install package:

pip install https://git.kemt.fei.tuke.sk/dano/websucker-pip/archive/master.zip

Usage

websuck --help