Added the neccessary info
This commit is contained in:
parent
3bb4286763
commit
d0db4121ee
281
read_me.md
281
read_me.md
@ -1,98 +1,271 @@
|
|||||||
============================================================================================================================================================
|
# Diploma Thesis Repository — SFT + DPO Training, Translation, and Safety Evaluation (SK/EN)
|
||||||
1. LLM_test.py – Generovanie odpovedí modelov
|
|
||||||
|
|
||||||
Tento skript je prvý krok celého procesu.
|
This repository contains the scripts I used in my thesis experiments to:
|
||||||
Spúšťaš ho, keď chceš nechať model (Gemma, LLaMA alebo Qwen) odpovedať na rôzne “nepríjemné” datasety (harmful prompts). Podľa toho sa potom hodnotí jeho bezpečnosť.
|
|
||||||
|
|
||||||
1.1 Ako to funguje
|
- prepare **PKU-SafeRLHF-30K** for **SFT** and **DPO**,
|
||||||
|
- translate datasets and model outputs with **NLLB (SK ↔ EN)**,
|
||||||
|
- train QLoRA adapters (SFT and DPO) for selected base models,
|
||||||
|
- generate model responses on safety datasets,
|
||||||
|
- evaluate responses with **Llama Guard 3**.
|
||||||
|
|
||||||
Pri spustení si script od teba vypýta:
|
Most scripts assume a local folder layout under `/home/hyrenko/Diploma/...` (models, datasets, outputs). If your paths are different, update the constants at the top of each script.
|
||||||
|
|
||||||
aký model chceš použiť,
|
---
|
||||||
|
|
||||||
aké GPU (ak máš),
|
## Repository structure
|
||||||
|
|
||||||
aký dataset chceš otestovať,
|
The scripts are grouped by purpose:
|
||||||
|
|
||||||
koľko promptov chceš spracovať.
|
```text
|
||||||
|
preparation/
|
||||||
|
prepar_dat_pku_dpo.py
|
||||||
|
|
||||||
Dataset si script načíta automaticky.
|
program/
|
||||||
|
copymaster.py
|
||||||
Ak je gated, vypýta si HF token.
|
Llama_test_trained.py
|
||||||
|
LLM_test.py
|
||||||
|
response_evaluate.py
|
||||||
|
|
||||||
Model každému promptu vygeneruje odpoveď a script kontroluje, či to náhodou nebola odpoveď v štýle “nemôžem odpovedať, som AI”. Toto sa počíta ako refusal.
|
Training/
|
||||||
|
convert_dpo_sft.py
|
||||||
|
training_dpo_sft_llama.py
|
||||||
|
training_dpo_sft_mistral_sk.py
|
||||||
|
|
||||||
Výsledky idú do priečinka outputs/<timestamp>-model-dataset/.
|
translate/
|
||||||
|
translate_do-not_answer.py
|
||||||
|
translate_PKF.py
|
||||||
|
Translate_sk_to_eng.py
|
||||||
|
```
|
||||||
|
|
||||||
Vo vnútri nájdeš:
|
---
|
||||||
|
|
||||||
responses.json – odpovede v strojovom formáte,
|
## Requirements
|
||||||
|
|
||||||
responses.txt – všetky prompty a odpovede pre ľudí,
|
- Python **3.8+**
|
||||||
|
- CUDA-capable GPU(s) for translation/training
|
||||||
|
- Typical packages:
|
||||||
|
- `torch`, `transformers`, `datasets`
|
||||||
|
- `peft`, `trl`, `accelerate`, `bitsandbytes`
|
||||||
|
- `tqdm`
|
||||||
|
|
||||||
summary.txt – súhrn odmietnutí podľa kategórií.
|
Install (example):
|
||||||
============================================================================================================================================================
|
|
||||||
|
|
||||||
2. copymaster.py – Triedenie výstupov
|
```bash
|
||||||
|
pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm
|
||||||
|
```
|
||||||
|
|
||||||
Keď už máš hromadu priečinkov v outputs/, potrebuješ to nejako zoradiť, aby každý model mal svoje miesto. O to sa stará copymaster.py.
|
Notes:
|
||||||
|
- `translate/translate_PKF.py` is written for a **2‑GPU** setup.
|
||||||
|
- Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers.
|
||||||
|
|
||||||
2.1 Čo robí:
|
---
|
||||||
|
|
||||||
Opýta sa, ktorý model chceš spracovať.
|
## Suggested workflow (high level)
|
||||||
|
|
||||||
Prejde všetky priečinky v outputs.
|
A typical run looks like this:
|
||||||
|
|
||||||
Všetko, čo obsahuje v názve “gemma”, “llama” alebo “qwen”, podľa toho čo si vybral, skopíruje do:
|
1) **Translate PKU** to Slovak (optional, if you need SK training data)
|
||||||
/response/<model>/
|
2) **Prepare** SFT + DPO datasets on disk
|
||||||
|
3) **Train** adapters (SFT/DPO)
|
||||||
|
4) **Generate** responses (base vs adapters)
|
||||||
|
5) **Translate** responses SK → EN (for Llama Guard)
|
||||||
|
6) **Evaluate** outputs with Llama Guard 3
|
||||||
|
|
||||||
Každý JSON dostane svoje číslo: 1.json, 2.json, 3.json…
|
You can also skip the Slovak translation and work directly with the original English PKU dataset via `Training/convert_dpo_sft.py`.
|
||||||
|
|
||||||
============================================================================================================================================================
|
---
|
||||||
|
|
||||||
3. response_evaluate.py – Hodnotenie bezpečnosti
|
## Scripts (what they do + how to run)
|
||||||
|
|
||||||
Toto je hlavný a najväčší skript.
|
Run commands below from the **repository root**.
|
||||||
Robí reálne hodnotenie, či sú prompty a odpovede modelov bezpečné alebo nie.
|
|
||||||
|
|
||||||
Používaš ho na porovnávanie modelov medzi sebou.
|
|
||||||
|
|
||||||
3.1 Čo robí:
|
---
|
||||||
|
|
||||||
Pýta si od teba, ktorú sadu z response/ chceš hodnotiť (llama, gemma, qwen).
|
### `preparation/prepar_dat_pku_dpo.py`
|
||||||
|
**Purpose:** Takes a translated PKU dataset saved on disk and produces two outputs:
|
||||||
|
- an **SFT** dataset (plain text prompts/completions),
|
||||||
|
- a **DPO** dataset (prompt/chosen/rejected).
|
||||||
|
|
||||||
Nechá ťa vybrať GPU alebo CPU.
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 preparation/prepar_dat_pku_dpo.py
|
||||||
|
```
|
||||||
|
|
||||||
Načíta si Llama Guard 3–8B.
|
**What to edit first:**
|
||||||
|
- `SRC_DIR` (input dataset saved via `datasets.save_to_disk`)
|
||||||
|
- output paths (`SFT_OUT`, `DPO_OUT`)
|
||||||
|
|
||||||
Každý prompt aj odpoveď vyhodnotí zvlášť:
|
---
|
||||||
|
|
||||||
prompt → je bezpečný / nebezpečný
|
### `Training/convert_dpo_sft.py`
|
||||||
|
**Purpose:** Downloads **PKU-Alignment/PKU-SafeRLHF-30K** from Hugging Face and converts it into:
|
||||||
|
- SFT: `./data/pku_sft.jsonl`
|
||||||
|
- DPO: `./data/pku_dpo/` (HF `save_to_disk` format)
|
||||||
|
|
||||||
odpoveď → model odpovedal bezpečne / nebezpečne
|
It shows a small interactive menu (SFT / DPO / BOTH).
|
||||||
|
|
||||||
Okrem Guardu používa aj tvoje vlastné heuristiky:
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 Training/convert_dpo_sft.py
|
||||||
|
```
|
||||||
|
|
||||||
ak prompt obsahuje “sex”, “dirty joke” atď. → označí ho rovno ako unsafe,
|
---
|
||||||
|
|
||||||
ak odpoveď obsahuje odmietnutie → automaticky safe.
|
### `translate/translate_PKF.py`
|
||||||
|
**Purpose:** Translates PKU-SafeRLHF-30K to Slovak using a **local NLLB** model and a **2‑GPU** multiprocessing setup.
|
||||||
|
|
||||||
Každý hodnotený záznam uloží do samostatného JSON.
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 translate/translate_PKF.py
|
||||||
|
```
|
||||||
|
|
||||||
Po spracovaní celého priečinka vytvorí:
|
**Resume / merge only:**
|
||||||
|
```bash
|
||||||
|
python3 translate/translate_PKF.py --resume
|
||||||
|
```
|
||||||
|
|
||||||
summary.json pre každý vstupný súbor,
|
**What to edit first:**
|
||||||
|
- `NLLB_PATH` (local NLLB model directory)
|
||||||
|
- output directory constants inside the script
|
||||||
|
|
||||||
summary_all.json pre celý model.
|
---
|
||||||
|
|
||||||
3.2 Výsledkom je úplná štatistika:
|
### `translate/translate_do-not_answer.py`
|
||||||
|
**Purpose:** Translates **LibrAI/do-not-answer** (by default the `question` field) using NLLB and saves the translated dataset to disk.
|
||||||
|
|
||||||
koľko promptov bolo unsafe,
|
**Run (defaults are usable as-is):**
|
||||||
|
```bash
|
||||||
|
python3 translate/translate_do-not_answer.py
|
||||||
|
```
|
||||||
|
|
||||||
koľko odpovedí bolo unsafe,
|
**Useful options:**
|
||||||
|
```bash
|
||||||
|
python3 translate/translate_do-not_answer.py --help
|
||||||
|
python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area
|
||||||
|
```
|
||||||
|
|
||||||
koľko párov bolo naraz unsafe,
|
---
|
||||||
|
|
||||||
porovnanie modelov podľa bezpečnosti.
|
### `Training/training_dpo_sft_llama.py`
|
||||||
============================================================================================================================================================
|
**Purpose:** Unified training script for **Llama (e.g. llama3.1‑8b)**:
|
||||||
|
- SFT (QLoRA + masked loss)
|
||||||
|
- DPO (TRL DPOTrainer)
|
||||||
|
|
||||||
|
It opens a menu (SFT / DPO / BOTH) and then relaunches itself via **accelerate** for multi-process training.
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 Training/training_dpo_sft_llama.py
|
||||||
|
```
|
||||||
|
|
||||||
|
If `accelerate` is not configured on your machine yet:
|
||||||
|
```bash
|
||||||
|
accelerate config
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `Training/training_dpo_sft_mistral_sk.py`
|
||||||
|
**Purpose:** Combined QLoRA training for **mistral-sk-7b**:
|
||||||
|
- `sft`
|
||||||
|
- `dpo`
|
||||||
|
- `both`
|
||||||
|
|
||||||
|
This script uses subcommands and supports CLI overrides (see `--help`).
|
||||||
|
|
||||||
|
**Help:**
|
||||||
|
```bash
|
||||||
|
python3 Training/training_dpo_sft_mistral_sk.py --help
|
||||||
|
```
|
||||||
|
|
||||||
|
**Multi-GPU examples:**
|
||||||
|
```bash
|
||||||
|
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft
|
||||||
|
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo
|
||||||
|
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `program/LLM_test.py`
|
||||||
|
**Purpose:** Interactive generator for model responses. Lets you pick:
|
||||||
|
- model source (base / adapters),
|
||||||
|
- GPU,
|
||||||
|
- dataset,
|
||||||
|
- generation limits.
|
||||||
|
|
||||||
|
Writes a `responses.json` under your configured outputs directory.
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 program/LLM_test.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `program/Llama_test_trained.py`
|
||||||
|
**Purpose:** Targeted evaluator for Llama runs. Useful for running **base vs SFT vs DPO** on a chosen dataset with explicit CLI flags.
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 program/Llama_test_trained.py --help
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `translate/Translate_sk_to_eng.py`
|
||||||
|
**Purpose:** Translates Slovak `responses.json` → English (so Llama Guard can score English outputs).
|
||||||
|
It scans your runs folder and writes translated outputs into `outputs_translated` (path is in the script).
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 translate/Translate_sk_to_eng.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `program/copymaster.py`
|
||||||
|
**Purpose:** Copies `responses.json` files from `outputs/` into structured folders under `response/{gemma|llama|qwen}/` (used by the evaluator).
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 program/copymaster.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**What to edit first:**
|
||||||
|
- `OUTPUTS_DIR` (source runs folder)
|
||||||
|
- `DEST_DIR` (destination base folder)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `program/response_evaluate.py`
|
||||||
|
**Purpose:** Runs **Llama Guard 3** on prompts + generated responses and saves:
|
||||||
|
- per-item evaluation JSON,
|
||||||
|
- summary stats (under `OUTPUT_ROOT`, configured in the script).
|
||||||
|
|
||||||
|
Inputs are expected in folders like:
|
||||||
|
- `/home/hyrenko/Diploma/response/{llama|gemma|qwen}`
|
||||||
|
- `/home/hyrenko/Diploma/outputs_translated` (translated mistral-sk runs)
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
python3 program/response_evaluate.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**What to edit first:**
|
||||||
|
- `MODEL_PATH` (local Llama Guard 3 model folder)
|
||||||
|
- input/output directories (`*_INPUT_DIR`, `OUTPUT_ROOT`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
- If something “can’t find file/path”, check the **constants at the top** of the script first.
|
||||||
|
- Keep outputs named consistently (`responses.json`) — several scripts rely on that.
|
||||||
|
- For multi-GPU training, make sure your CUDA devices are visible and `torchrun/accelerate` sees them.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user