diff --git a/read_me.md b/read_me.md index f94391b..38b9de5 100644 --- a/read_me.md +++ b/read_me.md @@ -1,98 +1,271 @@ -============================================================================================================================================================ -1. LLM_test.py – Generovanie odpovedí modelov +# Diploma Thesis Repository — SFT + DPO Training, Translation, and Safety Evaluation (SK/EN) - Tento skript je prvý krok celého procesu. - Spúšťaš ho, keď chceš nechať model (Gemma, LLaMA alebo Qwen) odpovedať na rôzne “nepríjemné” datasety (harmful prompts). Podľa toho sa potom hodnotí jeho bezpečnosť. +This repository contains the scripts I used in my thesis experiments to: -1.1 Ako to funguje +- prepare **PKU-SafeRLHF-30K** for **SFT** and **DPO**, +- translate datasets and model outputs with **NLLB (SK ↔ EN)**, +- train QLoRA adapters (SFT and DPO) for selected base models, +- generate model responses on safety datasets, +- evaluate responses with **Llama Guard 3**. - Pri spustení si script od teba vypýta: +Most scripts assume a local folder layout under `/home/hyrenko/Diploma/...` (models, datasets, outputs). If your paths are different, update the constants at the top of each script. - aký model chceš použiť, +--- - aké GPU (ak máš), +## Repository structure - aký dataset chceš otestovať, +The scripts are grouped by purpose: - koľko promptov chceš spracovať. +```text +preparation/ + prepar_dat_pku_dpo.py - Dataset si script načíta automaticky. - - Ak je gated, vypýta si HF token. +program/ + copymaster.py + Llama_test_trained.py + LLM_test.py + response_evaluate.py - Model každému promptu vygeneruje odpoveď a script kontroluje, či to náhodou nebola odpoveď v štýle “nemôžem odpovedať, som AI”. Toto sa počíta ako refusal. +Training/ + convert_dpo_sft.py + training_dpo_sft_llama.py + training_dpo_sft_mistral_sk.py - Výsledky idú do priečinka outputs/-model-dataset/. +translate/ + translate_do-not_answer.py + translate_PKF.py + Translate_sk_to_eng.py +``` -Vo vnútri nájdeš: +--- - responses.json – odpovede v strojovom formáte, +## Requirements - responses.txt – všetky prompty a odpovede pre ľudí, +- Python **3.8+** +- CUDA-capable GPU(s) for translation/training +- Typical packages: + - `torch`, `transformers`, `datasets` + - `peft`, `trl`, `accelerate`, `bitsandbytes` + - `tqdm` - summary.txt – súhrn odmietnutí podľa kategórií. -============================================================================================================================================================ +Install (example): -2. copymaster.py – Triedenie výstupov +```bash +pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm +``` - Keď už máš hromadu priečinkov v outputs/, potrebuješ to nejako zoradiť, aby každý model mal svoje miesto. O to sa stará copymaster.py. +Notes: +- `translate/translate_PKF.py` is written for a **2‑GPU** setup. +- Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers. -2.1 Čo robí: +--- - Opýta sa, ktorý model chceš spracovať. +## Suggested workflow (high level) - Prejde všetky priečinky v outputs. +A typical run looks like this: - Všetko, čo obsahuje v názve “gemma”, “llama” alebo “qwen”, podľa toho čo si vybral, skopíruje do: - /response// +1) **Translate PKU** to Slovak (optional, if you need SK training data) +2) **Prepare** SFT + DPO datasets on disk +3) **Train** adapters (SFT/DPO) +4) **Generate** responses (base vs adapters) +5) **Translate** responses SK → EN (for Llama Guard) +6) **Evaluate** outputs with Llama Guard 3 - Každý JSON dostane svoje číslo: 1.json, 2.json, 3.json… +You can also skip the Slovak translation and work directly with the original English PKU dataset via `Training/convert_dpo_sft.py`. -============================================================================================================================================================ +--- -3. response_evaluate.py – Hodnotenie bezpečnosti +## Scripts (what they do + how to run) - Toto je hlavný a najväčší skript. - Robí reálne hodnotenie, či sú prompty a odpovede modelov bezpečné alebo nie. - - Používaš ho na porovnávanie modelov medzi sebou. +Run commands below from the **repository root**. -3.1 Čo robí: +--- - Pýta si od teba, ktorú sadu z response/ chceš hodnotiť (llama, gemma, qwen). +### `preparation/prepar_dat_pku_dpo.py` +**Purpose:** Takes a translated PKU dataset saved on disk and produces two outputs: +- an **SFT** dataset (plain text prompts/completions), +- a **DPO** dataset (prompt/chosen/rejected). - Nechá ťa vybrať GPU alebo CPU. +**Run:** +```bash +python3 preparation/prepar_dat_pku_dpo.py +``` - Načíta si Llama Guard 3–8B. +**What to edit first:** +- `SRC_DIR` (input dataset saved via `datasets.save_to_disk`) +- output paths (`SFT_OUT`, `DPO_OUT`) - Každý prompt aj odpoveď vyhodnotí zvlášť: +--- - prompt → je bezpečný / nebezpečný +### `Training/convert_dpo_sft.py` +**Purpose:** Downloads **PKU-Alignment/PKU-SafeRLHF-30K** from Hugging Face and converts it into: +- SFT: `./data/pku_sft.jsonl` +- DPO: `./data/pku_dpo/` (HF `save_to_disk` format) - odpoveď → model odpovedal bezpečne / nebezpečne +It shows a small interactive menu (SFT / DPO / BOTH). - Okrem Guardu používa aj tvoje vlastné heuristiky: +**Run:** +```bash +python3 Training/convert_dpo_sft.py +``` - ak prompt obsahuje “sex”, “dirty joke” atď. → označí ho rovno ako unsafe, +--- - ak odpoveď obsahuje odmietnutie → automaticky safe. +### `translate/translate_PKF.py` +**Purpose:** Translates PKU-SafeRLHF-30K to Slovak using a **local NLLB** model and a **2‑GPU** multiprocessing setup. - Každý hodnotený záznam uloží do samostatného JSON. +**Run:** +```bash +python3 translate/translate_PKF.py +``` - Po spracovaní celého priečinka vytvorí: +**Resume / merge only:** +```bash +python3 translate/translate_PKF.py --resume +``` - summary.json pre každý vstupný súbor, +**What to edit first:** +- `NLLB_PATH` (local NLLB model directory) +- output directory constants inside the script - summary_all.json pre celý model. +--- -3.2 Výsledkom je úplná štatistika: +### `translate/translate_do-not_answer.py` +**Purpose:** Translates **LibrAI/do-not-answer** (by default the `question` field) using NLLB and saves the translated dataset to disk. - koľko promptov bolo unsafe, +**Run (defaults are usable as-is):** +```bash +python3 translate/translate_do-not_answer.py +``` - koľko odpovedí bolo unsafe, +**Useful options:** +```bash +python3 translate/translate_do-not_answer.py --help +python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area +``` - koľko párov bolo naraz unsafe, +--- - porovnanie modelov podľa bezpečnosti. -============================================================================================================================================================ +### `Training/training_dpo_sft_llama.py` +**Purpose:** Unified training script for **Llama (e.g. llama3.1‑8b)**: +- SFT (QLoRA + masked loss) +- DPO (TRL DPOTrainer) + +It opens a menu (SFT / DPO / BOTH) and then relaunches itself via **accelerate** for multi-process training. + +**Run:** +```bash +python3 Training/training_dpo_sft_llama.py +``` + +If `accelerate` is not configured on your machine yet: +```bash +accelerate config +``` + +--- + +### `Training/training_dpo_sft_mistral_sk.py` +**Purpose:** Combined QLoRA training for **mistral-sk-7b**: +- `sft` +- `dpo` +- `both` + +This script uses subcommands and supports CLI overrides (see `--help`). + +**Help:** +```bash +python3 Training/training_dpo_sft_mistral_sk.py --help +``` + +**Multi-GPU examples:** +```bash +torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft +torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo +torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both +``` + +--- + +### `program/LLM_test.py` +**Purpose:** Interactive generator for model responses. Lets you pick: +- model source (base / adapters), +- GPU, +- dataset, +- generation limits. + +Writes a `responses.json` under your configured outputs directory. + +**Run:** +```bash +python3 program/LLM_test.py +``` + +--- + +### `program/Llama_test_trained.py` +**Purpose:** Targeted evaluator for Llama runs. Useful for running **base vs SFT vs DPO** on a chosen dataset with explicit CLI flags. + +**Run:** +```bash +python3 program/Llama_test_trained.py --help +``` + +**Example:** +```bash +python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200 +``` + +--- + +### `translate/Translate_sk_to_eng.py` +**Purpose:** Translates Slovak `responses.json` → English (so Llama Guard can score English outputs). +It scans your runs folder and writes translated outputs into `outputs_translated` (path is in the script). + +**Run:** +```bash +python3 translate/Translate_sk_to_eng.py +``` + +--- + +### `program/copymaster.py` +**Purpose:** Copies `responses.json` files from `outputs/` into structured folders under `response/{gemma|llama|qwen}/` (used by the evaluator). + +**Run:** +```bash +python3 program/copymaster.py +``` + +**What to edit first:** +- `OUTPUTS_DIR` (source runs folder) +- `DEST_DIR` (destination base folder) + +--- + +### `program/response_evaluate.py` +**Purpose:** Runs **Llama Guard 3** on prompts + generated responses and saves: +- per-item evaluation JSON, +- summary stats (under `OUTPUT_ROOT`, configured in the script). + +Inputs are expected in folders like: +- `/home/hyrenko/Diploma/response/{llama|gemma|qwen}` +- `/home/hyrenko/Diploma/outputs_translated` (translated mistral-sk runs) + +**Run:** +```bash +python3 program/response_evaluate.py +``` + +**What to edit first:** +- `MODEL_PATH` (local Llama Guard 3 model folder) +- input/output directories (`*_INPUT_DIR`, `OUTPUT_ROOT`) + +--- + +## Tips + +- If something “can’t find file/path”, check the **constants at the top** of the script first. +- Keep outputs named consistently (`responses.json`) — several scripts rely on that. +- For multi-GPU training, make sure your CUDA devices are visible and `torchrun/accelerate` sees them.