dano/websucker-pip

Websucker Crawler Agent

Go to file

Daniel Hladek 8d4a873005 zz		2020-05-20 09:22:19 +02:00
websucker	zz	2020-05-20 09:22:19 +02:00
.gitignore	zz	2020-05-10 11:54:53 +02:00
Dockerfile	zz	2020-05-11 16:43:39 +02:00
LICENSE.txt	initial	2020-05-07 16:09:45 +02:00
MANIFEST.in	initial	2020-05-07 16:09:45 +02:00
README.md	zz	2020-05-13 15:20:20 +02:00
requirements.txt	zz	2020-05-10 11:48:17 +02:00
setup.py	zz	2020-05-10 11:48:17 +02:00

README.md

Websucker

Agent for Sucking the of Web

Features

Crawling of best domains
Crawling of unvisited domains
Text mining
Evaluation of domains
Daily report
Database Summary

Requirements

Python 3
running Cassandra 3.11
optional Beanstalkd for work queue

Installation

Activate virtual environment:

python -m virtualenv ./venv
source ./venv/bin/activate

Install package:

pip install https://git.kemt.fei.tuke.sk/dano/websucker-pip/archive/master.zip

Usage

websuck --help