--- title: Oliver Pejic published: true taxonomy: category: [iaeste] tag: [hatespeech,nlp] author: Daniel Hladek --- Oliver Pejic IAESTE Intern Summer 2024, six weeks in August and September Goal: - Help with the [Hate Speech Project](/topics/hatespeech) - Help with evaluation of sentence transformer models using toolkit [MTEB](https://github.com/embeddings-benchmark/mteb) Final Tasks: - Prepare an MTEB evaluation task for [Slovak HATE speech](https://huggingface.co/datasets/TUKE-KEMT/hate_speech_slovak). - Prepare an MTEB evaluation task for [Slovak question answering](https://huggingface.co/datasets/TUKE-KEMT/retrieval-skquad). - [Machine translate](https://huggingface.co/google/madlad400-3b-mt) an SBERT evaluation set for multiple slavic languages. - Write a short scientific paper with results. Preparation: - Get familiar with [SentenceTransformer](https://sbert.net/) framework, study fundamental papers and write down notes. - Get familiar with [MTEB](https://github.com/embeddings-benchmark/mteb) evaluation framework. - Prepare a working environment on Google Colab or on school server or Anaconda. - Get familiar with [existing finetuning scripts](https://git.kemt.fei.tuke.sk/dano/slovakretrieval).