DP/read_me.md
2026-02-05 15:34:02 +00:00

272 lines
6.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Diploma Thesis Repository — Increasing safety of large language models (SK/EN)
This repository contains the scripts I used in my thesis experiments to:
- prepare **PKU-SafeRLHF-30K** for **SFT** and **DPO**,
- translate datasets and model outputs with **NLLB (SK ↔ EN)**,
- train QLoRA adapters (SFT and DPO) for selected base models,
- generate model responses on safety datasets,
- evaluate responses with **Llama Guard 3**.
Most scripts assume a local folder layout under `/home/hyrenko/Diploma/...` (models, datasets, outputs). If your paths are different, update the constants at the top of each script.
---
## Repository structure
The scripts are grouped by purpose:
```text
preparation/
prepar_dat_pku_dpo.py
program/
copymaster.py
Llama_test_trained.py
LLM_test.py
response_evaluate.py
Training/
convert_dpo_sft.py
training_dpo_sft_llama.py
training_dpo_sft_mistral_sk.py
translate/
translate_do-not_answer.py
translate_PKF.py
Translate_sk_to_eng.py
```
---
## Requirements
- Python **3.8+**
- CUDA-capable GPU(s) for translation/training
- Typical packages:
- `torch`, `transformers`, `datasets`
- `peft`, `trl`, `accelerate`, `bitsandbytes`
- `tqdm`
Install (example):
```bash
pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm
```
Notes:
- `translate/translate_PKF.py` is written for a **2GPU** setup.
- Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers.
---
## Suggested workflow (high level)
A typical run looks like this:
1) **Translate PKU** to Slovak (optional, if you need SK training data)
2) **Prepare** SFT + DPO datasets on disk
3) **Train** adapters (SFT/DPO)
4) **Generate** responses (base vs adapters)
5) **Translate** responses SK → EN (for Llama Guard)
6) **Evaluate** outputs with Llama Guard 3
You can also skip the Slovak translation and work directly with the original English PKU dataset via `Training/convert_dpo_sft.py`.
---
## Scripts (what they do + how to run)
Run commands below from the **repository root**.
---
### `preparation/prepar_dat_pku_dpo.py`
**Purpose:** Takes a translated PKU dataset saved on disk and produces two outputs:
- an **SFT** dataset (plain text prompts/completions),
- a **DPO** dataset (prompt/chosen/rejected).
**Run:**
```bash
python3 preparation/prepar_dat_pku_dpo.py
```
**What to edit first:**
- `SRC_DIR` (input dataset saved via `datasets.save_to_disk`)
- output paths (`SFT_OUT`, `DPO_OUT`)
---
### `Training/convert_dpo_sft.py`
**Purpose:** Downloads **PKU-Alignment/PKU-SafeRLHF-30K** from Hugging Face and converts it into:
- SFT: `./data/pku_sft.jsonl`
- DPO: `./data/pku_dpo/` (HF `save_to_disk` format)
It shows a small interactive menu (SFT / DPO / BOTH).
**Run:**
```bash
python3 Training/convert_dpo_sft.py
```
---
### `translate/translate_PKF.py`
**Purpose:** Translates PKU-SafeRLHF-30K to Slovak using a **local NLLB** model and a **2GPU** multiprocessing setup.
**Run:**
```bash
python3 translate/translate_PKF.py
```
**Resume / merge only:**
```bash
python3 translate/translate_PKF.py --resume
```
**What to edit first:**
- `NLLB_PATH` (local NLLB model directory)
- output directory constants inside the script
---
### `translate/translate_do-not_answer.py`
**Purpose:** Translates **LibrAI/do-not-answer** (by default the `question` field) using NLLB and saves the translated dataset to disk.
**Run (defaults are usable as-is):**
```bash
python3 translate/translate_do-not_answer.py
```
**Useful options:**
```bash
python3 translate/translate_do-not_answer.py --help
python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area
```
---
### `Training/training_dpo_sft_llama.py`
**Purpose:** Unified training script for **Llama (e.g. llama3.18b)**:
- SFT (QLoRA + masked loss)
- DPO (TRL DPOTrainer)
It opens a menu (SFT / DPO / BOTH) and then relaunches itself via **accelerate** for multi-process training.
**Run:**
```bash
python3 Training/training_dpo_sft_llama.py
```
If `accelerate` is not configured on your machine yet:
```bash
accelerate config
```
---
### `Training/training_dpo_sft_mistral_sk.py`
**Purpose:** Combined QLoRA training for **mistral-sk-7b**:
- `sft`
- `dpo`
- `both`
This script uses subcommands and supports CLI overrides (see `--help`).
**Help:**
```bash
python3 Training/training_dpo_sft_mistral_sk.py --help
```
**Multi-GPU examples:**
```bash
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both
```
---
### `program/LLM_test.py`
**Purpose:** Interactive generator for model responses. Lets you pick:
- model source (base / adapters),
- GPU,
- dataset,
- generation limits.
Writes a `responses.json` under your configured outputs directory.
**Run:**
```bash
python3 program/LLM_test.py
```
---
### `program/Llama_test_trained.py`
**Purpose:** Targeted evaluator for Llama runs. Useful for running **base vs SFT vs DPO** on a chosen dataset with explicit CLI flags.
**Run:**
```bash
python3 program/Llama_test_trained.py --help
```
**Example:**
```bash
python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200
```
---
### `translate/Translate_sk_to_eng.py`
**Purpose:** Translates Slovak `responses.json` → English (so Llama Guard can score English outputs).
It scans your runs folder and writes translated outputs into `outputs_translated` (path is in the script).
**Run:**
```bash
python3 translate/Translate_sk_to_eng.py
```
---
### `program/copymaster.py`
**Purpose:** Copies `responses.json` files from `outputs/` into structured folders under `response/{gemma|llama|qwen}/` (used by the evaluator).
**Run:**
```bash
python3 program/copymaster.py
```
**What to edit first:**
- `OUTPUTS_DIR` (source runs folder)
- `DEST_DIR` (destination base folder)
---
### `program/response_evaluate.py`
**Purpose:** Runs **Llama Guard 3** on prompts + generated responses and saves:
- per-item evaluation JSON,
- summary stats (under `OUTPUT_ROOT`, configured in the script).
Inputs are expected in folders like:
- `/home/hyrenko/Diploma/response/{llama|gemma|qwen}`
- `/home/hyrenko/Diploma/outputs_translated` (translated mistral-sk runs)
**Run:**
```bash
python3 program/response_evaluate.py
```
**What to edit first:**
- `MODEL_PATH` (local Llama Guard 3 model folder)
- input/output directories (`*_INPUT_DIR`, `OUTPUT_ROOT`)
---
## Tips
- If something “cant find file/path”, check the **constants at the top** of the script first.
- Keep outputs named consistently (`responses.json`) — several scripts rely on that.
- For multi-GPU training, make sure your CUDA devices are visible and `torchrun/accelerate` sees them.