zpwiki/yussef_ressaissi at a1b4a3d2f895b86bea062dab0ed5074c7d4e9a59 - zpwiki - KEMT

do867bc/zpwiki

forked from KEMT/zpwiki

History

dano a1b4a3d2f8 Update pages/interns/yussef_ressaissi/README.md		2025-06-30 12:14:04 +00:00
..
README.md	Update pages/interns/yussef_ressaissi/README.md	2025-06-30 12:14:04 +00:00

README.md

title

published

taxonomy

Youssef Ressaissi

true

category

tag

author

iaeste

summarization

nlp

Daniel Hladek

IAESTE Intern Summer 2025, 1.7. - 31.8.2025

Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.

Tasks:

Get familiar with basic tools

and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
Read several recent papers about summarization using LLM and write a report.
Get familiar how to perform and evaluate document summarization using language models in Slovak.

Make a comparison experiment

Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
https://github.com/slovak-nlp/resources
Describe the experiments. Summarize results in a table. Describe the results.

Improve performance of a languge model.

Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
Run new expriments and write down the results.

Report and disseminate

Prepare a final report with analysis, experiments and conclusions.
Publish the fine-tuned models in HF HUB. Publish the paper from the project.