2020-05-07 14:09:45 +00:00
|
|
|
# Websucker
|
2020-05-13 13:20:20 +00:00
|
|
|
|
|
|
|
Agent for Sucking the of Web
|
|
|
|
|
|
|
|
## Features
|
|
|
|
|
|
|
|
- Crawling of best domains
|
|
|
|
- Crawling of unvisited domains
|
|
|
|
- Text mining
|
|
|
|
- Evaluation of domains
|
|
|
|
- Daily report
|
|
|
|
- Database Summary
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
- Python 3
|
|
|
|
- running Cassandra 3.11
|
|
|
|
- optional Beanstalkd for work queue
|
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
Activate virtual environment:
|
|
|
|
|
|
|
|
python -m virtualenv ./venv
|
|
|
|
source ./venv/bin/activate
|
|
|
|
|
|
|
|
Install package:
|
|
|
|
|
|
|
|
pip install https://git.kemt.fei.tuke.sk/dano/websucker-pip/archive/master.zip
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
websuck --help
|