Added the neccessary info

This commit is contained in:
Artur Hyrenko 2026-02-05 15:28:46 +00:00
parent 3bb4286763
commit d0db4121ee

View File

@ -1,98 +1,271 @@
============================================================================================================================================================
1. LLM_test.py Generovanie odpovedí modelov
# Diploma Thesis Repository — SFT + DPO Training, Translation, and Safety Evaluation (SK/EN)
Tento skript je prvý krok celého procesu.
Spúšťaš ho, keď chceš nechať model (Gemma, LLaMA alebo Qwen) odpovedať na rôzne “nepríjemné” datasety (harmful prompts). Podľa toho sa potom hodnotí jeho bezpečnosť.
This repository contains the scripts I used in my thesis experiments to:
1.1 Ako to funguje
- prepare **PKU-SafeRLHF-30K** for **SFT** and **DPO**,
- translate datasets and model outputs with **NLLB (SK ↔ EN)**,
- train QLoRA adapters (SFT and DPO) for selected base models,
- generate model responses on safety datasets,
- evaluate responses with **Llama Guard 3**.
Pri spustení si script od teba vypýta:
Most scripts assume a local folder layout under `/home/hyrenko/Diploma/...` (models, datasets, outputs). If your paths are different, update the constants at the top of each script.
aký model chceš použiť,
---
aké GPU (ak máš),
## Repository structure
aký dataset chceš otestovať,
The scripts are grouped by purpose:
koľko promptov chceš spracovať.
```text
preparation/
prepar_dat_pku_dpo.py
Dataset si script načíta automaticky.
program/
copymaster.py
Llama_test_trained.py
LLM_test.py
response_evaluate.py
Ak je gated, vypýta si HF token.
Training/
convert_dpo_sft.py
training_dpo_sft_llama.py
training_dpo_sft_mistral_sk.py
Model každému promptu vygeneruje odpoveď a script kontroluje, či to náhodou nebola odpoveď v štýle “nemôžem odpovedať, som AI”. Toto sa počíta ako refusal.
translate/
translate_do-not_answer.py
translate_PKF.py
Translate_sk_to_eng.py
```
Výsledky idú do priečinka outputs/<timestamp>-model-dataset/.
---
Vo vnútri nájdeš:
## Requirements
responses.json odpovede v strojovom formáte,
- Python **3.8+**
- CUDA-capable GPU(s) for translation/training
- Typical packages:
- `torch`, `transformers`, `datasets`
- `peft`, `trl`, `accelerate`, `bitsandbytes`
- `tqdm`
responses.txt všetky prompty a odpovede pre ľudí,
Install (example):
summary.txt súhrn odmietnutí podľa kategórií.
============================================================================================================================================================
```bash
pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm
```
2. copymaster.py Triedenie výstupov
Notes:
- `translate/translate_PKF.py` is written for a **2GPU** setup.
- Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers.
Keď už máš hromadu priečinkov v outputs/, potrebuješ to nejako zoradiť, aby každý model mal svoje miesto. O to sa stará copymaster.py.
---
2.1 Čo robí:
## Suggested workflow (high level)
Opýta sa, ktorý model chceš spracovať.
A typical run looks like this:
Prejde všetky priečinky v outputs.
1) **Translate PKU** to Slovak (optional, if you need SK training data)
2) **Prepare** SFT + DPO datasets on disk
3) **Train** adapters (SFT/DPO)
4) **Generate** responses (base vs adapters)
5) **Translate** responses SK → EN (for Llama Guard)
6) **Evaluate** outputs with Llama Guard 3
Všetko, čo obsahuje v názve “gemma”, “llama” alebo “qwen”, podľa toho čo si vybral, skopíruje do:
/response/<model>/
You can also skip the Slovak translation and work directly with the original English PKU dataset via `Training/convert_dpo_sft.py`.
Každý JSON dostane svoje číslo: 1.json, 2.json, 3.json…
---
============================================================================================================================================================
## Scripts (what they do + how to run)
3. response_evaluate.py Hodnotenie bezpečnosti
Run commands below from the **repository root**.
Toto je hlavný a najväčší skript.
Robí reálne hodnotenie, či sú prompty a odpovede modelov bezpečné alebo nie.
---
Používaš ho na porovnávanie modelov medzi sebou.
### `preparation/prepar_dat_pku_dpo.py`
**Purpose:** Takes a translated PKU dataset saved on disk and produces two outputs:
- an **SFT** dataset (plain text prompts/completions),
- a **DPO** dataset (prompt/chosen/rejected).
3.1 Čo robí:
**Run:**
```bash
python3 preparation/prepar_dat_pku_dpo.py
```
Pýta si od teba, ktorú sadu z response/ chceš hodnotiť (llama, gemma, qwen).
**What to edit first:**
- `SRC_DIR` (input dataset saved via `datasets.save_to_disk`)
- output paths (`SFT_OUT`, `DPO_OUT`)
Nechá ťa vybrať GPU alebo CPU.
---
Načíta si Llama Guard 38B.
### `Training/convert_dpo_sft.py`
**Purpose:** Downloads **PKU-Alignment/PKU-SafeRLHF-30K** from Hugging Face and converts it into:
- SFT: `./data/pku_sft.jsonl`
- DPO: `./data/pku_dpo/` (HF `save_to_disk` format)
Každý prompt aj odpoveď vyhodnotí zvlášť:
It shows a small interactive menu (SFT / DPO / BOTH).
prompt → je bezpečný / nebezpečný
**Run:**
```bash
python3 Training/convert_dpo_sft.py
```
odpoveď → model odpovedal bezpečne / nebezpečne
---
Okrem Guardu používa aj tvoje vlastné heuristiky:
### `translate/translate_PKF.py`
**Purpose:** Translates PKU-SafeRLHF-30K to Slovak using a **local NLLB** model and a **2GPU** multiprocessing setup.
ak prompt obsahuje “sex”, “dirty joke” atď. → označí ho rovno ako unsafe,
**Run:**
```bash
python3 translate/translate_PKF.py
```
ak odpoveď obsahuje odmietnutie → automaticky safe.
**Resume / merge only:**
```bash
python3 translate/translate_PKF.py --resume
```
Každý hodnotený záznam uloží do samostatného JSON.
**What to edit first:**
- `NLLB_PATH` (local NLLB model directory)
- output directory constants inside the script
Po spracovaní celého priečinka vytvorí:
---
summary.json pre každý vstupný súbor,
### `translate/translate_do-not_answer.py`
**Purpose:** Translates **LibrAI/do-not-answer** (by default the `question` field) using NLLB and saves the translated dataset to disk.
summary_all.json pre celý model.
**Run (defaults are usable as-is):**
```bash
python3 translate/translate_do-not_answer.py
```
3.2 Výsledkom je úplná štatistika:
**Useful options:**
```bash
python3 translate/translate_do-not_answer.py --help
python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area
```
koľko promptov bolo unsafe,
---
koľko odpovedí bolo unsafe,
### `Training/training_dpo_sft_llama.py`
**Purpose:** Unified training script for **Llama (e.g. llama3.18b)**:
- SFT (QLoRA + masked loss)
- DPO (TRL DPOTrainer)
koľko párov bolo naraz unsafe,
It opens a menu (SFT / DPO / BOTH) and then relaunches itself via **accelerate** for multi-process training.
porovnanie modelov podľa bezpečnosti.
============================================================================================================================================================
**Run:**
```bash
python3 Training/training_dpo_sft_llama.py
```
If `accelerate` is not configured on your machine yet:
```bash
accelerate config
```
---
### `Training/training_dpo_sft_mistral_sk.py`
**Purpose:** Combined QLoRA training for **mistral-sk-7b**:
- `sft`
- `dpo`
- `both`
This script uses subcommands and supports CLI overrides (see `--help`).
**Help:**
```bash
python3 Training/training_dpo_sft_mistral_sk.py --help
```
**Multi-GPU examples:**
```bash
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both
```
---
### `program/LLM_test.py`
**Purpose:** Interactive generator for model responses. Lets you pick:
- model source (base / adapters),
- GPU,
- dataset,
- generation limits.
Writes a `responses.json` under your configured outputs directory.
**Run:**
```bash
python3 program/LLM_test.py
```
---
### `program/Llama_test_trained.py`
**Purpose:** Targeted evaluator for Llama runs. Useful for running **base vs SFT vs DPO** on a chosen dataset with explicit CLI flags.
**Run:**
```bash
python3 program/Llama_test_trained.py --help
```
**Example:**
```bash
python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200
```
---
### `translate/Translate_sk_to_eng.py`
**Purpose:** Translates Slovak `responses.json` → English (so Llama Guard can score English outputs).
It scans your runs folder and writes translated outputs into `outputs_translated` (path is in the script).
**Run:**
```bash
python3 translate/Translate_sk_to_eng.py
```
---
### `program/copymaster.py`
**Purpose:** Copies `responses.json` files from `outputs/` into structured folders under `response/{gemma|llama|qwen}/` (used by the evaluator).
**Run:**
```bash
python3 program/copymaster.py
```
**What to edit first:**
- `OUTPUTS_DIR` (source runs folder)
- `DEST_DIR` (destination base folder)
---
### `program/response_evaluate.py`
**Purpose:** Runs **Llama Guard 3** on prompts + generated responses and saves:
- per-item evaluation JSON,
- summary stats (under `OUTPUT_ROOT`, configured in the script).
Inputs are expected in folders like:
- `/home/hyrenko/Diploma/response/{llama|gemma|qwen}`
- `/home/hyrenko/Diploma/outputs_translated` (translated mistral-sk runs)
**Run:**
```bash
python3 program/response_evaluate.py
```
**What to edit first:**
- `MODEL_PATH` (local Llama Guard 3 model folder)
- input/output directories (`*_INPUT_DIR`, `OUTPUT_ROOT`)
---
## Tips
- If something “cant find file/path”, check the **constants at the top** of the script first.
- Keep outputs named consistently (`responses.json`) — several scripts rely on that.
- For multi-GPU training, make sure your CUDA devices are visible and `torchrun/accelerate` sees them.