zpwiki

History

Daniel Hladek c3445e1106 zz		2025-08-19 10:36:18 +02:00
..
README.md	zz	2025-08-19 10:36:18 +02:00

README.md

title

published

taxonomy

Youssef Ressaissi

true

category

tag

author

iaeste

summarization

nlp

Daniel Hladek

IAESTE Intern Summer 2025, 1.7. - 31.8.2025

Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.

Tasks:

Get familiar with basic tools

and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
Read several recent papers about summarization using LLM and write a report.
Get familiar how to perform and evaluate document summarization using language models in Slovak.

Make a comparison experiment

Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
https://github.com/slovak-nlp/resources
Describe the experiments. Summarize results in a table. Describe the results.

Improve performance of a languge model.

Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
Run new expriments and write down the results.

Report and disseminate

Prepare a final report with analysis, experiments and conclusions.
Publish the fine-tuned models in HF HUB. Publish the paper from the project.

Meeting 19.8.

State:

Fine tuned Slovak Mistral 7B
Tried Llama3 7B - results look ok, but MIstral is Better.
Tried gpt-oss, but it does not work because of dependencies.
Work on preliminary final report.
ROUGE score is not good for abstractive summarization.
The best way to evaluate so far is to see it in person.

Tasks:

Try to fine tune other models. 'google/gemma-3-4b-it, HPLT/hplt2c_slk_checkpoints Qwen/Qwen3-4B'. Results wil be in different branches of the repository.
Try to automatically evaluate the results using a large LLM. Read some papers about it. Prepare a script using ollama and gpt-oss-20B.
Work on the final report.

Meeting 4.8.

State:

Tested LMs with ROUGE metrics, most models got 4-5 ROGUE, facebook/mbart-large-50 got 17 (trained for translation).
In my opinion, large-50 is not good for finetuning, because it is already fine tuned for translation.
no finetuning done yet.

Tasks:

Try evaluate google/flan-t5-large, kiviki/mbart-slovaksum-large-sum and similar models. These should be already working.
continue working on finetuning t5 or Mbart models, but ask when you are stuck. Use hf examples script on summarization

Future tasks:

use LLMS (open or closed) and evaluate (ROUGE) summarization without fine-tuning on slovak legal data set
install lm-eval-harness, learn it, prepare and run task for slovak summarization

Meeting 13.8.2025

State:

Managed to fine-tune Slovak Mistral-7B on Legal documents. One epoch is enough.
Ditched T5 pipeline

Tasks:

try to fine tune some different llm and compare the results.
try to fine tune with the news dataset.
prepare a table - tables with results
compare with zero- shot scenario (with various models)
dont forget to put scripts on GIT.

Future task:

Find the optimal hyperparameters.
Write the technical report, where you summarize methods, experiments and results.

Meeting 24.7.

State:

Working custom environment with JupyterNotebook
Fine-tuning mbart, results are not great

Tasks:

Try T5 based models: slovak t5-base, umt5, mt5, flan-t5,
Zero shot-evaluate LLMs on news and legal data.
Find a way to fine-tune LLM for summarization.
Fine-tune LLM for summarization.

Meeting 17.7.2025:

State:

Studying of the task, metrics (ROUGE,BLEU)
Loaded a model. preprocessed a dataset, evaluated a model
loaded more models, used SlovakSum, generated summarization with four model and compare them with ROUGE and BLEU (TUKE-KEMT/slovak-t5-base, google/mt5-small, google/mt5-base, facebook/mbart-large-50)
the comparison is without fine tuning (zero shot), for far, the best is MBART-large
working on legal dataset "dennlinger/eur-lex-sum",
notebooks are on the kemt git

Tasks:

Prepare "mango.kemt.fei.tuke.sk" workflow
Finetune an existing models and evaluate it. Use News and Legal datasets
Try mbart-large, flan-t5-large, slovak-t5-base, google/t5-v1_1-large
Describe the experimental setup, prepare tables with results.

Future tasks:

Try prompting LLM and evaluation of the results. We need to pick LLM with SLovak Support
Finetune an LLM to summarize
Use medical data (after they are ready).
Prepare a detailed report (to be converted into a paper).