DP/read_me.md
2026-02-05 15:34:02 +00:00

6.8 KiB
Raw Blame History

Diploma Thesis Repository — Increasing safety of large language models (SK/EN)

This repository contains the scripts I used in my thesis experiments to:

  • prepare PKU-SafeRLHF-30K for SFT and DPO,
  • translate datasets and model outputs with NLLB (SK ↔ EN),
  • train QLoRA adapters (SFT and DPO) for selected base models,
  • generate model responses on safety datasets,
  • evaluate responses with Llama Guard 3.

Most scripts assume a local folder layout under /home/hyrenko/Diploma/... (models, datasets, outputs). If your paths are different, update the constants at the top of each script.


Repository structure

The scripts are grouped by purpose:

preparation/
  prepar_dat_pku_dpo.py

program/
  copymaster.py
  Llama_test_trained.py
  LLM_test.py
  response_evaluate.py

Training/
  convert_dpo_sft.py
  training_dpo_sft_llama.py
  training_dpo_sft_mistral_sk.py

translate/
  translate_do-not_answer.py
  translate_PKF.py
  Translate_sk_to_eng.py

Requirements

  • Python 3.8+
  • CUDA-capable GPU(s) for translation/training
  • Typical packages:
    • torch, transformers, datasets
    • peft, trl, accelerate, bitsandbytes
    • tqdm

Install (example):

pip install -U torch transformers datasets peft trl accelerate bitsandbytes tqdm

Notes:

  • translate/translate_PKF.py is written for a 2GPU setup.
  • Some scripts expect local model folders (e.g. NLLB, Llama Guard). Update paths in the script headers.

Suggested workflow (high level)

A typical run looks like this:

  1. Translate PKU to Slovak (optional, if you need SK training data)
  2. Prepare SFT + DPO datasets on disk
  3. Train adapters (SFT/DPO)
  4. Generate responses (base vs adapters)
  5. Translate responses SK → EN (for Llama Guard)
  6. Evaluate outputs with Llama Guard 3

You can also skip the Slovak translation and work directly with the original English PKU dataset via Training/convert_dpo_sft.py.


Scripts (what they do + how to run)

Run commands below from the repository root.


preparation/prepar_dat_pku_dpo.py

Purpose: Takes a translated PKU dataset saved on disk and produces two outputs:

  • an SFT dataset (plain text prompts/completions),
  • a DPO dataset (prompt/chosen/rejected).

Run:

python3 preparation/prepar_dat_pku_dpo.py

What to edit first:

  • SRC_DIR (input dataset saved via datasets.save_to_disk)
  • output paths (SFT_OUT, DPO_OUT)

Training/convert_dpo_sft.py

Purpose: Downloads PKU-Alignment/PKU-SafeRLHF-30K from Hugging Face and converts it into:

  • SFT: ./data/pku_sft.jsonl
  • DPO: ./data/pku_dpo/ (HF save_to_disk format)

It shows a small interactive menu (SFT / DPO / BOTH).

Run:

python3 Training/convert_dpo_sft.py

translate/translate_PKF.py

Purpose: Translates PKU-SafeRLHF-30K to Slovak using a local NLLB model and a 2GPU multiprocessing setup.

Run:

python3 translate/translate_PKF.py

Resume / merge only:

python3 translate/translate_PKF.py --resume

What to edit first:

  • NLLB_PATH (local NLLB model directory)
  • output directory constants inside the script

translate/translate_do-not_answer.py

Purpose: Translates LibrAI/do-not-answer (by default the question field) using NLLB and saves the translated dataset to disk.

Run (defaults are usable as-is):

python3 translate/translate_do-not_answer.py

Useful options:

python3 translate/translate_do-not_answer.py --help
python3 translate/translate_do-not_answer.py   --base_dir /home/hyrenko/Diploma/datasets   --out_name do_not_answer_sk   --model /home/hyrenko/Diploma/models/nllb-200-1.3B   --translate_fields question,risk_area

Training/training_dpo_sft_llama.py

Purpose: Unified training script for Llama (e.g. llama3.18b):

  • SFT (QLoRA + masked loss)
  • DPO (TRL DPOTrainer)

It opens a menu (SFT / DPO / BOTH) and then relaunches itself via accelerate for multi-process training.

Run:

python3 Training/training_dpo_sft_llama.py

If accelerate is not configured on your machine yet:

accelerate config

Training/training_dpo_sft_mistral_sk.py

Purpose: Combined QLoRA training for mistral-sk-7b:

  • sft
  • dpo
  • both

This script uses subcommands and supports CLI overrides (see --help).

Help:

python3 Training/training_dpo_sft_mistral_sk.py --help

Multi-GPU examples:

torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py sft
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py dpo
torchrun --nproc_per_node=2 Training/training_dpo_sft_mistral_sk.py both

program/LLM_test.py

Purpose: Interactive generator for model responses. Lets you pick:

  • model source (base / adapters),
  • GPU,
  • dataset,
  • generation limits.

Writes a responses.json under your configured outputs directory.

Run:

python3 program/LLM_test.py

program/Llama_test_trained.py

Purpose: Targeted evaluator for Llama runs. Useful for running base vs SFT vs DPO on a chosen dataset with explicit CLI flags.

Run:

python3 program/Llama_test_trained.py --help

Example:

python3 program/Llama_test_trained.py --dataset do-not-answer --mode dpo --limit 200

translate/Translate_sk_to_eng.py

Purpose: Translates Slovak responses.json → English (so Llama Guard can score English outputs).
It scans your runs folder and writes translated outputs into outputs_translated (path is in the script).

Run:

python3 translate/Translate_sk_to_eng.py

program/copymaster.py

Purpose: Copies responses.json files from outputs/ into structured folders under response/{gemma|llama|qwen}/ (used by the evaluator).

Run:

python3 program/copymaster.py

What to edit first:

  • OUTPUTS_DIR (source runs folder)
  • DEST_DIR (destination base folder)

program/response_evaluate.py

Purpose: Runs Llama Guard 3 on prompts + generated responses and saves:

  • per-item evaluation JSON,
  • summary stats (under OUTPUT_ROOT, configured in the script).

Inputs are expected in folders like:

  • /home/hyrenko/Diploma/response/{llama|gemma|qwen}
  • /home/hyrenko/Diploma/outputs_translated (translated mistral-sk runs)

Run:

python3 program/response_evaluate.py

What to edit first:

  • MODEL_PATH (local Llama Guard 3 model folder)
  • input/output directories (*_INPUT_DIR, OUTPUT_ROOT)

Tips

  • If something “cant find file/path”, check the constants at the top of the script first.
  • Keep outputs named consistently (responses.json) — several scripts rely on that.
  • For multi-GPU training, make sure your CUDA devices are visible and torchrun/accelerate sees them.