149 lines
		
	
	
		
			4.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			149 lines
		
	
	
		
			4.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Youssef Ressaissi
 | |
| published: true
 | |
| taxonomy:
 | |
|     category: [iaeste]
 | |
|     tag: [summarization,nlp]
 | |
|     author: Daniel Hladek
 | |
| ---
 | |
| 
 | |
| 
 | |
| IAESTE Intern Summer 2025, 1.7. - 31.8.2025
 | |
| 
 | |
| Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.
 | |
| 
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| 1. Get familiar with basic tools 
 | |
|   -  and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
 | |
|   - Read several recent papers about summarization using LLM and write a report.
 | |
|   - Get familiar how to perform and evaluate document summarization using language models in Slovak.
 | |
| 2. Make a comparison experiment
 | |
|   - Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
 | |
|   - https://github.com/slovak-nlp/resources
 | |
|   - Describe the experiments. Summarize results in a table. Describe the results. 
 | |
| 3. Improve performance of a languge model. 
 | |
|   - Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
 | |
|   - Run new expriments and write down the results.
 | |
| 4. Report and disseminate
 | |
|   - Prepare a final report with analysis, experiments and conclusions.
 | |
|   - Publish the fine-tuned models in HF HUB. Publish the paper from the project.
 | |
| 
 | |
| Meeting 25.8.2025
 | |
| 
 | |
| State:
 | |
| 
 | |
| - Fine tuned several models - Slovak Mistral, 
 | |
| - HPLT and Gemma 3 looks the best
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| - Finish Report
 | |
| - Define the task that we solved.
 | |
| - List models, Describe the datasets, experimental setup - type of finetuning, type of evaluation.
 | |
| - put results in a table for each model / zero shot performance / manual evaluation / finetuning performance.
 | |
| - describe the created scripts.
 | |
| - write down a conclustion. What did we learn? What new did we discover?
 | |
| - put script and report to the git repository
 | |
| 
 | |
| Meeting 19.8.
 | |
| 
 | |
| State:
 | |
| 
 | |
| - Fine tuned Slovak Mistral 7B
 | |
| - Tried Llama3 7B - results look ok, but MIstral is Better.
 | |
| - Tried gpt-oss, but it does not work because of dependencies.
 | |
| - Work on preliminary final report.
 | |
| - ROUGE score is not good for abstractive summarization.
 | |
| - The best way to evaluate so far is to see it in person.
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| - Try to fine tune other models. 'google/gemma-3-4b-it, HPLT/hplt2c_slk_checkpoints Qwen/Qwen3-4B'. Results wil be in different branches of the repository.
 | |
| - Try to automatically evaluate the results using a large LLM. Read some papers about it. Prepare a script using ollama and gpt-oss-20B.
 | |
| - Work on the final report.
 | |
|  
 | |
| 
 | |
| 
 | |
| Meeting 4.8.
 | |
| 
 | |
| State:
 | |
| 
 | |
| - Tested LMs with ROUGE metrics, most models got 4-5 ROGUE, facebook/mbart-large-50  got 17 (trained for translation).
 | |
| - In my opinion, large-50 is not good for finetuning, because it is already fine tuned for translation.
 | |
| - no finetuning done yet. 
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| - Try evaluate google/flan-t5-large, kiviki/mbart-slovaksum-large-sum and similar models. These should be already working.
 | |
| - continue working on finetuning t5 or Mbart models, but ask when you are stuck. Use hf examples script on summarization
 | |
| 
 | |
| Future tasks:
 | |
| 
 | |
| - use LLMS (open or closed) and evaluate (ROUGE) summarization without fine-tuning on slovak legal data set
 | |
| - install lm-eval-harness, learn it, prepare and run task for slovak summarization
 | |
| 
 | |
| 
 | |
| Meeting 13.8.2025
 | |
| 
 | |
| State:
 | |
| 
 | |
| - Managed to fine-tune Slovak Mistral-7B on Legal documents. One epoch is enough. 
 | |
| - Ditched T5 pipeline
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| - try to fine tune some different llm and compare the results.
 | |
| - try to fine tune with the news dataset.
 | |
| - prepare a table - tables with results
 | |
| - compare with zero- shot scenario (with various models)
 | |
| - dont forget to put scripts on GIT.
 | |
| 
 | |
| Future task:
 | |
| 
 | |
| - Find the optimal hyperparameters.
 | |
| - Write the technical report, where you summarize methods, experiments and results.
 | |
| 
 | |
| 
 | |
| Meeting 24.7.
 | |
| 
 | |
| State:
 | |
| 
 | |
| - Working custom environment with JupyterNotebook
 | |
| - Fine-tuning mbart, results are not great
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| - Try T5 based models: slovak t5-base, umt5, mt5, flan-t5,
 | |
| - Zero shot-evaluate LLMs on news and legal data.
 | |
| - Find a way to fine-tune LLM for summarization.
 | |
| - Fine-tune LLM for summarization.
 | |
| 
 | |
| Meeting 17.7.2025:
 | |
| 
 | |
| State:
 | |
| 
 | |
| - Studying of the task, metrics (ROUGE,BLEU)
 | |
| - Loaded a model. preprocessed a dataset, evaluated a model
 | |
| - loaded more models, used SlovakSum, generated summarization with four model and compare them with ROUGE and BLEU (TUKE-KEMT/slovak-t5-base, google/mt5-small,  google/mt5-base,  facebook/mbart-large-50)
 | |
| - the comparison is without fine tuning (zero shot), for far, the best is MBART-large
 | |
| - working on legal dataset "dennlinger/eur-lex-sum", 
 | |
| - notebooks are on the kemt git
 | |
| 
 | |
| Tasks:
 | |
| 
 | |
| - Prepare "mango.kemt.fei.tuke.sk" workflow
 | |
| - Finetune an existing models and evaluate it. Use News and Legal datasets
 | |
| - Try mbart-large, flan-t5-large, slovak-t5-base, google/t5-v1_1-large
 | |
| - Describe the experimental setup, prepare tables with results.
 | |
| 
 | |
| 
 | |
| Future tasks:
 | |
| 
 | |
| - Try prompting LLM and evaluation of the results. We need to pick LLM with SLovak Support
 | |
| - Finetune an LLM to summarize
 | |
| - Use medical data (after they are ready).
 | |
| - Prepare a detailed report (to be converted into a paper).
 | |
| 
 |