bakalarka_praca/telegram_bot/README.md

122 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Hate Speech Bot Documentation
## 🧠 Application Overview
The Hate Speech Bot is a complete system for the automated detection of hate speech within **Telegram**. It functions as a monitoring and analysis tool that enables:
- Automatic **interception and analysis of user messages** in group chats
- When hate speech is detected, it records information about the **user and the message into a database**
- Displays violators through a **real-time web interface**
### System Components:
1. **Telegram Bot**: Intercepts messages and uses a fine-tuned LLM model to classify whether the message contains hate speech.
2. **MySQL Database**: Stores user information, messages, and incident timestamps.
3. **Web Application (Flask)**: Displays a real-time table of violations, updated every 5 seconds.
---
## 🧱 List of Used Docker Containers
| Container | Description |
|-------------------------------------|-------------|
| `tetianamohorian/hate-speech-bot` | Contains the Telegram bot, classification model (e.g. fine-tuned BERT), and Flask frontend |
| `mysql:8` | Relational database for storing violator records |
---
## ☸️ List of Kubernetes Objects
| Object | Description |
|---------------------------|-------------|
| `Namespace: botspace` | Isolated space for all app-related objects |
| `Deployment: bot-deployment` | Deploys the container with the bot |
| `Deployment: flask-web` | Deploys the Flask web server |
| `StatefulSet: mysql` | Ensures persistent and consistent MySQL operation |
| `PersistentVolume` | Stores database data outside the pod |
| `PersistentVolumeClaim` | Requests disk space for the database |
| `Service: mysql` | Internal service for accessing the database |
| `Service: flask-service` | Port-forward service for accessing the web |
| `ConfigMap: init-sql` | Contains initial SQL schema scripts |
| `Secret: bot-secret` | Secure storage of sensitive data like the Telegram token |
---
## 🌐 Virtual Networks and Storage
- All pods communicate over an internal network in `botspace`, minimizing data leakage risks
- `PersistentVolume` attached to MySQL ensures data preservation even after pod restarts
---
## ⚙️ Container Configuration
- **Flask Web**: Runs on port `5000`, accessible via `flask-service`. Automatically updates via JavaScript (AJAX `fetch`) every 5 seconds
- **Telegram Bot**: Runs in a separate thread, classifies messages using a PyTorch model or HuggingFace Transformers, and logs results into the database
- **MySQL**: Initialized with an `initContainer` that runs a script from the `ConfigMap`
---
## 🚀 Usage Guide
### 1. Prepare the Application:
```bash
./prepare-app.sh
```
- Builds the Docker image
- Pushes it to Docker Hub
---
### 2. Start the Application:
```bash
./start-app.sh
```
- Creates Kubernetes objects
- Starts port-forwarding for Flask web on `localhost:8888`
---
### 3. Stop Port-Forwarding:
```bash
pkill -f "kubectl port-forward"
```
- Terminates temporary access via port-forward
---
### 4. Remove the Application:
```bash
./stop-app.sh
```
- Deletes the entire `botspace`, including database and services
---
## 🌍 Accessing the Web Interface
After launching the app, open a browser:
```
http://localhost:8888
```
- A table of violators will be displayed
- Data updates automatically every 5 seconds
---
## 📦 Technologies Used
- **Python, Flask, PyTorch / Transformers** text analysis and classification
- **Docker, Docker Hub** containerization
- **Kubernetes** orchestration, scaling, and network isolation
- **MySQL** relational data storage
- **JavaScript (AJAX)** real-time frontend updates