zpwiki

History

Daniel Hladek a099634802 zz		2025-08-04 14:13:40 +02:00
..
README.md	zz	2025-08-04 14:13:40 +02:00

title

published

taxonomy

Youssef Ressaissi

true

category

tag

author

iaeste

summarization

nlp

Daniel Hladek

IAESTE Intern Summer 2025, 1.7. - 31.8.2025

Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.

Tasks:

and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
Read several recent papers about summarization using LLM and write a report.
Get familiar how to perform and evaluate document summarization using language models in Slovak.

Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
https://github.com/slovak-nlp/resources
Describe the experiments. Summarize results in a table. Describe the results.

Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
Run new expriments and write down the results.

Meeting 4.8.

State:

Tested LMs with ROUGE metrics, most models got 4-5 ROGUE, facebook/mbart-large-50 got 17 (trained for translation).
In my opinion, large-50 is not good for finetuning, because it is already fine tuned for translation.
no finetuning done yet.

Tasks:

Try evaluate google/flan-t5-large, kiviki/mbart-slovaksum-large-sum and similar models. These should be already working.
continue working on finetuning t5 or Mbart models, but ask when you are stuck. Use hf examples script on summarization

Future tasks:

use LLMS (open or closed) and evaluate (ROUGE) summarization without fine-tuning on slovak legal data set
install lm-eval-harness, learn it, prepare and run task for slovak summarization

Meeting 24.7.

State:

Tasks:

Meeting 17.7.2025:

State:

Studying of the task, metrics (ROUGE,BLEU)
Loaded a model. preprocessed a dataset, evaluated a model
loaded more models, used SlovakSum, generated summarization with four model and compare them with ROUGE and BLEU (TUKE-KEMT/slovak-t5-base, google/mt5-small, google/mt5-base, facebook/mbart-large-50)
the comparison is without fine tuning (zero shot), for far, the best is MBART-large
working on legal dataset "dennlinger/eur-lex-sum",
notebooks are on the kemt git

Tasks:

Future tasks:

Try prompting LLM and evaluation of the results. We need to pick LLM with SLovak Support
Finetune an LLM to summarize
Use medical data (after they are ready).
Prepare a detailed report (to be converted into a paper).