# Diploma Thesis Repository — Increasing safety of large language models (SK/EN) This repository contains the scripts I used in my thesis experiments to: - prepare **PKU-SafeRLHF-30K** for **SFT** and **DPO**, - translate datasets and model outputs with **NLLB (SK ↔ EN)**, - train QLoRA adapters (SFT and DPO) for selected base models, - generate model responses on safety datasets, - evaluate responses with **Llama Guard 3**. Most scripts assume a local folder layout under `/home/hyrenko/Diploma/...` (models, datasets, outputs). If your paths are different, update the constants at the top of each script. --- ## Repository structure The scripts are grouped by purpose: ```text preparation/ prepar_dat_pku_dpo.py program/ copymaster.py Llama_test_trained.py LLM_test.py response_evaluate.py Training/ convert_dpo_sft.py training_dpo_sft_llama.py training_dpo_sft_mistral_sk.py translate/ translate_do-not_answer.py translate_PKF.py Translate_sk_to_eng.py ``` --- ## Requirements - Python **3.8+** - CUDA-capable GPU(s) for translation/training - Typical packages: - `torch`, `transformers`, `datasets` - `peft`, `trl`, `accelerate`, `bitsandbytes` - `tqdm` Install (example): ```bash pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm ``` Notes: - `translate/translate_PKF.py` is written for a **2‑GPU** setup. - Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers. --- ## Suggested workflow (high level) A typical run looks like this: 1) **Translate PKU** to Slovak (optional, if you need SK training data) 2) **Prepare** SFT + DPO datasets on disk 3) **Train** adapters (SFT/DPO) 4) **Generate** responses (base vs adapters) 5) **Translate** responses SK → EN (for Llama Guard) 6) **Evaluate** outputs with Llama Guard 3 You can also skip the Slovak translation and work directly with the original English PKU dataset via `Training/convert_dpo_sft.py`. --- ## Scripts (what they do + how to run) Run commands below from the **repository root**. --- ### `preparation/prepar_dat_pku_dpo.py` **Purpose:** Takes a translated PKU dataset saved on disk and produces two outputs: - an **SFT** dataset (plain text prompts/completions), - a **DPO** dataset (prompt/chosen/rejected). **Run:** ```bash python3 preparation/prepar_dat_pku_dpo.py ``` **What to edit first:** - `SRC_DIR` (input dataset saved via `datasets.save_to_disk`) - output paths (`SFT_OUT`, `DPO_OUT`) --- ### `Training/convert_dpo_sft.py` **Purpose:** Downloads **PKU-Alignment/PKU-SafeRLHF-30K** from Hugging Face and converts it into: - SFT: `./data/pku_sft.jsonl` - DPO: `./data/pku_dpo/` (HF `save_to_disk` format) It shows a small interactive menu (SFT / DPO / BOTH). **Run:** ```bash python3 Training/convert_dpo_sft.py ``` --- ### `translate/translate_PKF.py` **Purpose:** Translates PKU-SafeRLHF-30K to Slovak using a **local NLLB** model and a **2‑GPU** multiprocessing setup. **Run:** ```bash python3 translate/translate_PKF.py ``` **Resume / merge only:** ```bash python3 translate/translate_PKF.py --resume ``` **What to edit first:** - `NLLB_PATH` (local NLLB model directory) - output directory constants inside the script --- ### `translate/translate_do-not_answer.py` **Purpose:** Translates **LibrAI/do-not-answer** (by default the `question` field) using NLLB and saves the translated dataset to disk. **Run (defaults are usable as-is):** ```bash python3 translate/translate_do-not_answer.py ``` **Useful options:** ```bash python3 translate/translate_do-not_answer.py --help python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area ``` --- ### `Training/training_dpo_sft_llama.py` **Purpose:** Unified training script for **Llama (e.g. llama3.1‑8b)**: - SFT (QLoRA + masked loss) - DPO (TRL DPOTrainer) It opens a menu (SFT / DPO / BOTH) and then relaunches itself via **accelerate** for multi-process training. **Run:** ```bash python3 Training/training_dpo_sft_llama.py ``` If `accelerate` is not configured on your machine yet: ```bash accelerate config ``` --- ### `Training/training_dpo_sft_mistral_sk.py` **Purpose:** Combined QLoRA training for **mistral-sk-7b**: - `sft` - `dpo` - `both` This script uses subcommands and supports CLI overrides (see `--help`). **Help:** ```bash python3 Training/training_dpo_sft_mistral_sk.py --help ``` **Multi-GPU examples:** ```bash torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both ``` --- ### `program/LLM_test.py` **Purpose:** Interactive generator for model responses. Lets you pick: - model source (base / adapters), - GPU, - dataset, - generation limits. Writes a `responses.json` under your configured outputs directory. **Run:** ```bash python3 program/LLM_test.py ``` --- ### `program/Llama_test_trained.py` **Purpose:** Targeted evaluator for Llama runs. Useful for running **base vs SFT vs DPO** on a chosen dataset with explicit CLI flags. **Run:** ```bash python3 program/Llama_test_trained.py --help ``` **Example:** ```bash python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200 ``` --- ### `translate/Translate_sk_to_eng.py` **Purpose:** Translates Slovak `responses.json` → English (so Llama Guard can score English outputs). It scans your runs folder and writes translated outputs into `outputs_translated` (path is in the script). **Run:** ```bash python3 translate/Translate_sk_to_eng.py ``` --- ### `program/copymaster.py` **Purpose:** Copies `responses.json` files from `outputs/` into structured folders under `response/{gemma|llama|qwen}/` (used by the evaluator). **Run:** ```bash python3 program/copymaster.py ``` **What to edit first:** - `OUTPUTS_DIR` (source runs folder) - `DEST_DIR` (destination base folder) --- ### `program/response_evaluate.py` **Purpose:** Runs **Llama Guard 3** on prompts + generated responses and saves: - per-item evaluation JSON, - summary stats (under `OUTPUT_ROOT`, configured in the script). Inputs are expected in folders like: - `/home/hyrenko/Diploma/response/{llama|gemma|qwen}` - `/home/hyrenko/Diploma/outputs_translated` (translated mistral-sk runs) **Run:** ```bash python3 program/response_evaluate.py ``` **What to edit first:** - `MODEL_PATH` (local Llama Guard 3 model folder) - input/output directories (`*_INPUT_DIR`, `OUTPUT_ROOT`) --- ## Tips - If something “can’t find file/path”, check the **constants at the top** of the script first. - Keep outputs named consistently (`responses.json`) — several scripts rely on that. - For multi-GPU training, make sure your CUDA devices are visible and `torchrun/accelerate` sees them.