6.8 KiB
Diploma Thesis Repository — Increasing safety of large language models (SK/EN)
This repository contains the scripts I used in my thesis experiments to:
- prepare PKU-SafeRLHF-30K for SFT and DPO,
- translate datasets and model outputs with NLLB (SK ↔ EN),
- train QLoRA adapters (SFT and DPO) for selected base models,
- generate model responses on safety datasets,
- evaluate responses with Llama Guard 3.
Most scripts assume a local folder layout under /home/hyrenko/Diploma/... (models, datasets, outputs). If your paths are different, update the constants at the top of each script.
Repository structure
The scripts are grouped by purpose:
preparation/
prepar_dat_pku_dpo.py
program/
copymaster.py
Llama_test_trained.py
LLM_test.py
response_evaluate.py
Training/
convert_dpo_sft.py
training_dpo_sft_llama.py
training_dpo_sft_mistral_sk.py
translate/
translate_do-not_answer.py
translate_PKF.py
Translate_sk_to_eng.py
Requirements
- Python 3.8+
- CUDA-capable GPU(s) for translation/training
- Typical packages:
torch,transformers,datasetspeft,trl,accelerate,bitsandbytestqdm
Install (example):
pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm
Notes:
translate/translate_PKF.pyis written for a 2‑GPU setup.- Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers.
Suggested workflow (high level)
A typical run looks like this:
- Translate PKU to Slovak (optional, if you need SK training data)
- Prepare SFT + DPO datasets on disk
- Train adapters (SFT/DPO)
- Generate responses (base vs adapters)
- Translate responses SK → EN (for Llama Guard)
- Evaluate outputs with Llama Guard 3
You can also skip the Slovak translation and work directly with the original English PKU dataset via Training/convert_dpo_sft.py.
Scripts (what they do + how to run)
Run commands below from the repository root.
preparation/prepar_dat_pku_dpo.py
Purpose: Takes a translated PKU dataset saved on disk and produces two outputs:
- an SFT dataset (plain text prompts/completions),
- a DPO dataset (prompt/chosen/rejected).
Run:
python3 preparation/prepar_dat_pku_dpo.py
What to edit first:
SRC_DIR(input dataset saved viadatasets.save_to_disk)- output paths (
SFT_OUT,DPO_OUT)
Training/convert_dpo_sft.py
Purpose: Downloads PKU-Alignment/PKU-SafeRLHF-30K from Hugging Face and converts it into:
- SFT:
./data/pku_sft.jsonl - DPO:
./data/pku_dpo/(HFsave_to_diskformat)
It shows a small interactive menu (SFT / DPO / BOTH).
Run:
python3 Training/convert_dpo_sft.py
translate/translate_PKF.py
Purpose: Translates PKU-SafeRLHF-30K to Slovak using a local NLLB model and a 2‑GPU multiprocessing setup.
Run:
python3 translate/translate_PKF.py
Resume / merge only:
python3 translate/translate_PKF.py --resume
What to edit first:
NLLB_PATH(local NLLB model directory)- output directory constants inside the script
translate/translate_do-not_answer.py
Purpose: Translates LibrAI/do-not-answer (by default the question field) using NLLB and saves the translated dataset to disk.
Run (defaults are usable as-is):
python3 translate/translate_do-not_answer.py
Useful options:
python3 translate/translate_do-not_answer.py --help
python3 translate/translate_do-not_answer.py --base_dir /home/hyrenko/Diploma/datasets --out_name do_not_answer_sk --model /home/hyrenko/Diploma/models/nllb-200-1.3B --translate_fields question,risk_area
Training/training_dpo_sft_llama.py
Purpose: Unified training script for Llama (e.g. llama3.1‑8b):
- SFT (QLoRA + masked loss)
- DPO (TRL DPOTrainer)
It opens a menu (SFT / DPO / BOTH) and then relaunches itself via accelerate for multi-process training.
Run:
python3 Training/training_dpo_sft_llama.py
If accelerate is not configured on your machine yet:
accelerate config
Training/training_dpo_sft_mistral_sk.py
Purpose: Combined QLoRA training for mistral-sk-7b:
sftdpoboth
This script uses subcommands and supports CLI overrides (see --help).
Help:
python3 Training/training_dpo_sft_mistral_sk.py --help
Multi-GPU examples:
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both
program/LLM_test.py
Purpose: Interactive generator for model responses. Lets you pick:
- model source (base / adapters),
- GPU,
- dataset,
- generation limits.
Writes a responses.json under your configured outputs directory.
Run:
python3 program/LLM_test.py
program/Llama_test_trained.py
Purpose: Targeted evaluator for Llama runs. Useful for running base vs SFT vs DPO on a chosen dataset with explicit CLI flags.
Run:
python3 program/Llama_test_trained.py --help
Example:
python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200
translate/Translate_sk_to_eng.py
Purpose: Translates Slovak responses.json → English (so Llama Guard can score English outputs).
It scans your runs folder and writes translated outputs into outputs_translated (path is in the script).
Run:
python3 translate/Translate_sk_to_eng.py
program/copymaster.py
Purpose: Copies responses.json files from outputs/ into structured folders under response/{gemma|llama|qwen}/ (used by the evaluator).
Run:
python3 program/copymaster.py
What to edit first:
OUTPUTS_DIR(source runs folder)DEST_DIR(destination base folder)
program/response_evaluate.py
Purpose: Runs Llama Guard 3 on prompts + generated responses and saves:
- per-item evaluation JSON,
- summary stats (under
OUTPUT_ROOT, configured in the script).
Inputs are expected in folders like:
/home/hyrenko/Diploma/response/{llama|gemma|qwen}/home/hyrenko/Diploma/outputs_translated(translated mistral-sk runs)
Run:
python3 program/response_evaluate.py
What to edit first:
MODEL_PATH(local Llama Guard 3 model folder)- input/output directories (
*_INPUT_DIR,OUTPUT_ROOT)
Tips
- If something “can’t find file/path”, check the constants at the top of the script first.
- Keep outputs named consistently (
responses.json) — several scripts rely on that. - For multi-GPU training, make sure your CUDA devices are visible and
torchrun/acceleratesees them.