zpwiki

History

Daniel Hladek 020dc8ca83 zz		2025-06-26 09:34:22 +02:00
..
README.md	zz	2025-06-26 09:34:22 +02:00

title

published

taxonomy

Youssef Ressaissi

true

category

tag

author

iaeste

summarization

nlp

Daniel Hladek

IAESTE Intern Summer 2025, 1.7. - 31.8.2025

Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.

Tasks:

Get familiar with basic tools and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
Read several recent papers about summarization using LLM and write a report.
Get familiar how to perform and evaluate document summarization using language models in Slovak.
Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
Describe the experiments. Summarize results in a table. Describe the results.
Improve performance of a languge model. Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
Run new expriments and write down the results.
Publish the fine-tuned models in HF HUB. Publish the paper from the project.