Added the neccessary info

This commit is contained in:
Artur Hyrenko 2026-02-05 15:28:46 +00:00
parent 3bb4286763
commit d0db4121ee

View File

@ -1,98 +1,271 @@
============================================================================================================================================================
1. LLM_test.py Generovanie odpovedí modelov
# Diploma Thesis Repository — SFT + DPO Training, Translation, and Safety Evaluation (SK/EN)
Tento skript je prvý krok celého procesu.
Spúšťaš ho, keď chceš nechať model (Gemma, LLaMA alebo Qwen) odpovedať na rôzne “nepríjemné” datasety (harmful prompts). Podľa toho sa potom hodnotí jeho bezpečnosť.
This repository contains the scripts I used in my thesis experiments to:
1.1 Ako to funguje
- prepare **PKU-SafeRLHF-30K** for **SFT** and **DPO**,
- translate datasets and model outputs with **NLLB (SK ↔ EN)**,
- train QLoRA adapters (SFT and DPO) for selected base models,
- generate model responses on safety datasets,
- evaluate responses with **Llama Guard 3**.
Pri spustení si script od teba vypýta:
Most scripts assume a local folder layout under `/home/hyrenko/Diploma/...` (models, datasets, outputs). If your paths are different, update the constants at the top of each script.
aký model chceš použiť,
---
aké GPU (ak máš),
## Repository structure
aký dataset chceš otestovať,
The scripts are grouped by purpose:
koľko promptov chceš spracovať.
```text
preparation/
prepar_dat_pku_dpo.py
Dataset si script načíta automaticky.
Ak je gated, vypýta si HF token.
program/
copymaster.py
Llama_test_trained.py
LLM_test.py
response_evaluate.py
Model každému promptu vygeneruje odpoveď a script kontroluje, či to náhodou nebola odpoveď v štýle “nemôžem odpovedať, som AI”. Toto sa počíta ako refusal.
Training/
convert_dpo_sft.py
training_dpo_sft_llama.py
training_dpo_sft_mistral_sk.py
Výsledky idú do priečinka outputs/<timestamp>-model-dataset/.
translate/
translate_do-not_answer.py
translate_PKF.py
Translate_sk_to_eng.py
```
Vo vnútri nájdeš:
---
responses.json odpovede v strojovom formáte,
## Requirements
responses.txt všetky prompty a odpovede pre ľudí,
- Python **3.8+**
- CUDA-capable GPU(s) for translation/training
- Typical packages:
- `torch`, `transformers`, `datasets`
- `peft`, `trl`, `accelerate`, `bitsandbytes`
- `tqdm`
summary.txt súhrn odmietnutí podľa kategórií.
============================================================================================================================================================
Install (example):
2. copymaster.py Triedenie výstupov
```bash
pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm
```
Keď už máš hromadu priečinkov v outputs/, potrebuješ to nejako zoradiť, aby každý model mal svoje miesto. O to sa stará copymaster.py.
Notes:
- `translate/translate_PKF.py` is written for a **2GPU** setup.
- Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers.
2.1 Čo robí:
---
Opýta sa, ktorý model chceš spracovať.
## Suggested workflow (high level)
Prejde všetky priečinky v outputs.
A typical run looks like this:
Všetko, čo obsahuje v názve “gemma”, “llama” alebo “qwen”, podľa toho čo si vybral, skopíruje do:
/response/<model>/
1) **Translate PKU** to Slovak (optional, if you need SK training data)
2) **Prepare** SFT + DPO datasets on disk
3) **Train** adapters (SFT/DPO)
4) **Generate** responses (base vs adapters)
5) **Translate** responses SK → EN (for Llama Guard)
6) **Evaluate** outputs with Llama Guard 3
Každý JSON dostane svoje číslo: 1.json, 2.json, 3.json…
You can also skip the Slovak translation and work directly with the original English PKU dataset via `Training/convert_dpo_sft.py`.
============================================================================================================================================================
---
3. response_evaluate.py Hodnotenie bezpečnosti
## Scripts (what they do + how to run)
Toto je hlavný a najväčší skript.
Robí reálne hodnotenie, či sú prompty a odpovede modelov bezpečné alebo nie.
Používaš ho na porovnávanie modelov medzi sebou.
Run commands below from the **repository root**.
3.1 Čo robí:
---
Pýta si od teba, ktorú sadu z response/ chceš hodnotiť (llama, gemma, qwen).
### `preparation/prepar_dat_pku_dpo.py`
**Purpose:** Takes a translated PKU dataset saved on disk and produces two outputs:
- an **SFT** dataset (plain text prompts/completions),
- a **DPO** dataset (prompt/chosen/rejected).
Nechá ťa vybrať GPU alebo CPU.
**Run:**
```bash
python3 preparation/prepar_dat_pku_dpo.py
```
Načíta si Llama Guard 38B.
**What to edit first:**
- `SRC_DIR` (input dataset saved via `datasets.save_to_disk`)
- output paths (`SFT_OUT`, `DPO_OUT`)
Každý prompt aj odpoveď vyhodnotí zvlášť:
---
prompt → je bezpečný / nebezpečný
### `Training/convert_dpo_sft.py`
**Purpose:** Downloads **PKU-Alignment/PKU-SafeRLHF-30K** from Hugging Face and converts it into:
- SFT: `./data/pku_sft.jsonl`
- DPO: `./data/pku_dpo/` (HF `save_to_disk` format)
odpoveď → model odpovedal bezpečne / nebezpečne
It shows a small interactive menu (SFT / DPO / BOTH).
Okrem Guardu používa aj tvoje vlastné heuristiky:
**Run:**
```bash
python3 Training/convert_dpo_sft.py
```
ak prompt obsahuje “sex”, “dirty joke” atď. → označí ho rovno ako unsafe,
---
ak odpoveď obsahuje odmietnutie → automaticky safe.
### `translate/translate_PKF.py`
**Purpose:** Translates PKU-SafeRLHF-30K to Slovak using a **local NLLB** model and a **2GPU** multiprocessing setup.
Každý hodnotený záznam uloží do samostatného JSON.
**Run:**
```bash
python3 translate/translate_PKF.py
```
Po spracovaní celého priečinka vytvorí:
**Resume / merge only:**
```bash
python3 translate/translate_PKF.py --resume
```
summary.json pre každý vstupný súbor,
**What to edit first:**
- `NLLB_PATH` (local NLLB model directory)
- output directory constants inside the script
summary_all.json pre celý model.
---
3.2 Výsledkom je úplná štatistika:
### `translate/translate_do-not_answer.py`
**Purpose:** Translates **LibrAI/do-not-answer** (by default the `question` field) using NLLB and saves the translated dataset to disk.
koľko promptov bolo unsafe,
**Run (defaults are usable as-is):**
```bash
python3 translate/translate_do-not_answer.py
```
koľko odpovedí bolo unsafe,
**Useful options:**
```bash
python3 translate/translate_do-not_answer.py --help
python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area
```
koľko párov bolo naraz unsafe,
---
porovnanie modelov podľa bezpečnosti.
============================================================================================================================================================
### `Training/training_dpo_sft_llama.py`
**Purpose:** Unified training script for **Llama (e.g. llama3.18b)**:
- SFT (QLoRA + masked loss)
- DPO (TRL DPOTrainer)
It opens a menu (SFT / DPO / BOTH) and then relaunches itself via **accelerate** for multi-process training.
**Run:**
```bash
python3 Training/training_dpo_sft_llama.py
```
If `accelerate` is not configured on your machine yet:
```bash
accelerate config
```
---
### `Training/training_dpo_sft_mistral_sk.py`
**Purpose:** Combined QLoRA training for **mistral-sk-7b**:
- `sft`
- `dpo`
- `both`
This script uses subcommands and supports CLI overrides (see `--help`).
**Help:**
```bash
python3 Training/training_dpo_sft_mistral_sk.py --help
```
**Multi-GPU examples:**
```bash
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both
```
---
### `program/LLM_test.py`
**Purpose:** Interactive generator for model responses. Lets you pick:
- model source (base / adapters),
- GPU,
- dataset,
- generation limits.
Writes a `responses.json` under your configured outputs directory.
**Run:**
```bash
python3 program/LLM_test.py
```
---
### `program/Llama_test_trained.py`
**Purpose:** Targeted evaluator for Llama runs. Useful for running **base vs SFT vs DPO** on a chosen dataset with explicit CLI flags.
**Run:**
```bash
python3 program/Llama_test_trained.py --help
```
**Example:**
```bash
python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200
```
---
### `translate/Translate_sk_to_eng.py`
**Purpose:** Translates Slovak `responses.json` → English (so Llama Guard can score English outputs).
It scans your runs folder and writes translated outputs into `outputs_translated` (path is in the script).
**Run:**
```bash
python3 translate/Translate_sk_to_eng.py
```
---
### `program/copymaster.py`
**Purpose:** Copies `responses.json` files from `outputs/` into structured folders under `response/{gemma|llama|qwen}/` (used by the evaluator).
**Run:**
```bash
python3 program/copymaster.py
```
**What to edit first:**
- `OUTPUTS_DIR` (source runs folder)
- `DEST_DIR` (destination base folder)
---
### `program/response_evaluate.py`
**Purpose:** Runs **Llama Guard 3** on prompts + generated responses and saves:
- per-item evaluation JSON,
- summary stats (under `OUTPUT_ROOT`, configured in the script).
Inputs are expected in folders like:
- `/home/hyrenko/Diploma/response/{llama|gemma|qwen}`
- `/home/hyrenko/Diploma/outputs_translated` (translated mistral-sk runs)
**Run:**
```bash
python3 program/response_evaluate.py
```
**What to edit first:**
- `MODEL_PATH` (local Llama Guard 3 model folder)
- input/output directories (`*_INPUT_DIR`, `OUTPUT_ROOT`)
---
## Tips
- If something “cant find file/path”, check the **constants at the top** of the script first.
- Keep outputs named consistently (`responses.json`) — several scripts rely on that.
- For multi-GPU training, make sure your CUDA devices are visible and `torchrun/accelerate` sees them.