zpwiki/pages/interns/oliver_pejic/README.md

---
title: Oliver Pejic
published: true
taxonomy:
    category: [iaeste]
    tag: [hatespeech,nlp]
    author: Daniel Hladek
---

Oliver Pejic

IAESTE Intern Summer 2024, 12 weeks in August, September and October.

Goal:
 
- Help with the [Hate Speech Project](/topics/hatespeech)
- Help with evaluation of sentence transformer models using toolkit [MTEB](https://github.com/embeddings-benchmark/mteb) 

Final Tasks:

- Prepare an MTEB evaluation task for [Slovak HATE speech](https://huggingface.co/datasets/TUKE-KEMT/hate_speech_slovak).
- Prepare an MTEB evaluation task for [Slovak question answering](https://huggingface.co/datasets/TUKE-KEMT/retrieval-skquad).
- [Machine translate](https://huggingface.co/google/madlad400-3b-mt) an SBERT evaluation set for multiple slavic languages.
- Write a short scientific paper with results.

Meeting 3.9:

State: Studied BEIR framework and transformers.

Tasks:

- Prepare and try BEIR evaluation tasks for the database. For evaluation you can try me5-base model. 
- Make a fork of BEIR and do necessary modification, including the documentation references for the task.
- Prepare 2 GITHUB pull requests for the databases, preliminary BEIR script given.

Future tasks:

- Prepare a machine translation system to create another slovak/multilingual evaluation task from English task. 

Preparation (7.8.2024):

- Get familiar with [SentenceTransformer](https://sbert.net/) framework, study fundamental papers and write down notes.
- Get familiar with [MTEB](https://github.com/embeddings-benchmark/mteb) evaluation framework.
- Prepare a working  environment on Google Colab or on school server or Anaconda.
- Get familiar with [existing finetuning scripts](https://git.kemt.fei.tuke.sk/dano/slovakretrieval).
zz 2024-08-05 23:05:54 +00:00			`---`
			`title: Oliver Pejic`
			`published: true`
			`taxonomy:`
			`category: [iaeste]`
			`tag: [hatespeech,nlp]`
			`author: Daniel Hladek`
			`---`

			`Oliver Pejic`

Update pages/interns/oliver_pejic/README.md 2024-09-03 09:11:07 +00:00			`IAESTE Intern Summer 2024, 12 weeks in August, September and October.`
zz 2024-08-05 23:05:54 +00:00
			`Goal:`

			`- Help with the [Hate Speech Project](/topics/hatespeech)`
			`- Help with evaluation of sentence transformer models using toolkit [MTEB](https://github.com/embeddings-benchmark/mteb)`

			`Final Tasks:`

			`- Prepare an MTEB evaluation task for [Slovak HATE speech](https://huggingface.co/datasets/TUKE-KEMT/hate_speech_slovak).`
			`- Prepare an MTEB evaluation task for [Slovak question answering](https://huggingface.co/datasets/TUKE-KEMT/retrieval-skquad).`
			`- [Machine translate](https://huggingface.co/google/madlad400-3b-mt) an SBERT evaluation set for multiple slavic languages.`
			`- Write a short scientific paper with results.`

Update pages/interns/oliver_pejic/README.md 2024-09-03 09:09:25 +00:00			`Meeting 3.9:`

			`State: Studied BEIR framework and transformers.`

			`Tasks:`

			`- Prepare and try BEIR evaluation tasks for the database. For evaluation you can try me5-base model.`
			`- Make a fork of BEIR and do necessary modification, including the documentation references for the task.`
			`- Prepare 2 GITHUB pull requests for the databases, preliminary BEIR script given.`

			`Future tasks:`

			`- Prepare a machine translation system to create another slovak/multilingual evaluation task from English task.`

zz 2024-08-05 23:07:43 +00:00			`Preparation (7.8.2024):`
zz 2024-08-05 23:05:54 +00:00
			`- Get familiar with [SentenceTransformer](https://sbert.net/) framework, study fundamental papers and write down notes.`
			`- Get familiar with [MTEB](https://github.com/embeddings-benchmark/mteb) evaluation framework.`
			`- Prepare a working environment on Google Colab or on school server or Anaconda.`
			`- Get familiar with [existing finetuning scripts](https://git.kemt.fei.tuke.sk/dano/slovakretrieval).`