.. | ||
README.md |
title | published | taxonomy | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Youssef Ressaissi | true |
|
IAESTE Intern Summer 2025, 1.7. - 31.8.2025
Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.
Tasks:
- Get familiar with basic tools
- and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
- Read several recent papers about summarization using LLM and write a report.
- Get familiar how to perform and evaluate document summarization using language models in Slovak.
- Make a comparison experiment
- Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
- https://github.com/slovak-nlp/resources
- Describe the experiments. Summarize results in a table. Describe the results.
- Improve performance of a languge model.
- Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
- Run new expriments and write down the results.
- Report and disseminate
- Prepare a final report with analysis, experiments and conclusions.
- Publish the fine-tuned models in HF HUB. Publish the paper from the project.
Meeting 4.8.
State:
- Tested LMs with ROUGE metrics, most models got 4-5 ROGUE, facebook/mbart-large-50 got 17 (trained for translation).
- In my opinion, large-50 is not good for finetuning, because it is already fine tuned for translation.
- no finetuning done yet.
Tasks:
- Try evaluate google/flan-t5-large, kiviki/mbart-slovaksum-large-sum and similar models. These should be already working.
- continue working on finetuning t5 or Mbart models, but ask when you are stuck. Use hf examples script on summarization
Future tasks:
- use LLMS (open or closed) and evaluate (ROUGE) summarization without fine-tuning on slovak legal data set
- install lm-eval-harness, learn it, prepare and run task for slovak summarization
Meeting 24.7.
State:
- Working custom environment with JupyterNotebook
- Fine-tuning mbart, results are not great
Tasks:
- Try T5 based models: slovak t5-base, umt5, mt5, flan-t5,
- Zero shot-evaluate LLMs on news and legal data.
- Find a way to fine-tune LLM for summarization.
- Fine-tune LLM for summarization.
Meeting 17.7.2025:
State:
- Studying of the task, metrics (ROUGE,BLEU)
- Loaded a model. preprocessed a dataset, evaluated a model
- loaded more models, used SlovakSum, generated summarization with four model and compare them with ROUGE and BLEU (TUKE-KEMT/slovak-t5-base, google/mt5-small, google/mt5-base, facebook/mbart-large-50)
- the comparison is without fine tuning (zero shot), for far, the best is MBART-large
- working on legal dataset "dennlinger/eur-lex-sum",
- notebooks are on the kemt git
Tasks:
- Prepare "mango.kemt.fei.tuke.sk" workflow
- Finetune an existing models and evaluate it. Use News and Legal datasets
- Try mbart-large, flan-t5-large, slovak-t5-base, google/t5-v1_1-large
- Describe the experimental setup, prepare tables with results.
Future tasks:
- Try prompting LLM and evaluation of the results. We need to pick LLM with SLovak Support
- Finetune an LLM to summarize
- Use medical data (after they are ready).
- Prepare a detailed report (to be converted into a paper).