forked from KEMT/zpwiki
34 lines
1.3 KiB
Markdown
34 lines
1.3 KiB
Markdown
---
|
|
title: Youssef Ressaissi
|
|
published: true
|
|
taxonomy:
|
|
category: [iaeste]
|
|
tag: [summarization,nlp]
|
|
author: Daniel Hladek
|
|
---
|
|
|
|
|
|
IAESTE Intern Summer 2025, 1.7. - 31.8.2025
|
|
|
|
Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.
|
|
|
|
|
|
Tasks:
|
|
|
|
1. Get familiar with basic tools
|
|
- and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
|
|
- Read several recent papers about summarization using LLM and write a report.
|
|
- Get familiar how to perform and evaluate document summarization using language models in Slovak.
|
|
2. Make a comparison experiment
|
|
- Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
|
|
- https://github.com/slovak-nlp/resources
|
|
- Describe the experiments. Summarize results in a table. Describe the results.
|
|
3. Improve performance of a languge model.
|
|
- Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
|
|
- Run new expriments and write down the results.
|
|
4. Report and disseminate
|
|
- Prepare a final report with analysis, experiments and conclusions.
|
|
- Publish the fine-tuned models in HF HUB. Publish the paper from the project.
|
|
|
|
|