zpwiki/pages/interns/yussef_ressaissi/README.md

---
title: Youssef Ressaissi
published: true
taxonomy:
    category: [iaeste]
    tag: [summarization,nlp]
    author: Daniel Hladek
---


IAESTE Intern Summer 2025, 1.7. - 31.8.2025

Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.


Tasks:

1. Get familiar with basic tools
  -  and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
  - Read several recent papers about summarization using LLM and write a report.
  - Get familiar how to perform and evaluate document summarization using language models in Slovak.
2. Make a comparison experiment
  - Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
  - https://github.com/slovak-nlp/resources
  - Describe the experiments. Summarize results in a table. Describe the results.
3. Improve performance of a languge model.
  - Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
  - Run new expriments and write down the results.
4. Report and disseminate
  - Prepare a final report with analysis, experiments and conclusions.
  - Publish the fine-tuned models in HF HUB. Publish the paper from the project.

Meeting 17.7.2025:

State:

- Studying of the task, metrics (ROUGE,BLEU)
- Loaded a model. preprocessed a dataset, evaluated a model
- loaded more models, used SlovakSum, generated summarization with four model and comapre them with ROUGE and BLEU (TUKE-KEMT/slovak-t5-base, google/mt5-small,  google/mt5-base,  facebook/mbart-large-50)
- the comparisin is without fine tuning (zero shot), for far, the best is MBART-large
- working on legal dataset "dennlinger/eur-lex-sum",
- notebooks are on the kemt git

Tasks:

- Prepare "mango.kemt.fei.tuke.sk" workflow
- Finetune an existing models and evaluate it. Use News and Legal datasets
- Try mbart-large, flan-t5-large, slovak-t5-base, google/t5-v1_1-large
- Describe the experimental setup, prepare tables with results.


Future tasks:

- Try prompting LLM and evaluation of the results. We need to pick LLM with SLovak Support
- Finetune an LLM to summarize
- Use medical data (after they are ready).
- Prepare a detailed report (to be converted into a paper).