diff --git a/pages/interns/cesar_gutierrez/README.md b/pages/interns/cesar_gutierrez/README.md index 1d012337..7ab5fc37 100644 --- a/pages/interns/cesar_gutierrez/README.md +++ b/pages/interns/cesar_gutierrez/README.md @@ -1,5 +1,16 @@ +--- +title: Cesar Abascal Gutierrez +published: true +taxonomy: + category: [iaeste] + tag: [ner,nlp] + author: Daniel Hladek +--- + ## Named entity annotations +Intern, probably summer 2019 + Cesar Abascal Gutierrez ## Goals diff --git a/pages/interns/oliver_pejic/README.md b/pages/interns/oliver_pejic/README.md new file mode 100644 index 00000000..dc73ca1e --- /dev/null +++ b/pages/interns/oliver_pejic/README.md @@ -0,0 +1,34 @@ +--- +title: Oliver Pejic +published: true +taxonomy: + category: [iaeste] + tag: [hatespeech,nlp] + author: Daniel Hladek +--- + +Oliver Pejic + +IAESTE Intern Summer 2024, six weeks in August and September + +Goal: + +- Help with the [Hate Speech Project](/topics/hatespeech) +- Help with evaluation of sentence transformer models using toolkit [MTEB](https://github.com/embeddings-benchmark/mteb) + +Final Tasks: + +- Prepare an MTEB evaluation task for [Slovak HATE speech](https://huggingface.co/datasets/TUKE-KEMT/hate_speech_slovak). +- Prepare an MTEB evaluation task for [Slovak question answering](https://huggingface.co/datasets/TUKE-KEMT/retrieval-skquad). +- [Machine translate](https://huggingface.co/google/madlad400-3b-mt) an SBERT evaluation set for multiple slavic languages. +- Write a short scientific paper with results. + +Preparation: + +- Get familiar with [SentenceTransformer](https://sbert.net/) framework, study fundamental papers and write down notes. +- Get familiar with [MTEB](https://github.com/embeddings-benchmark/mteb) evaluation framework. +- Prepare a working environment on Google Colab or on school server or Anaconda. +- Get familiar with [existing finetuning scripts](https://git.kemt.fei.tuke.sk/dano/slovakretrieval). + + + diff --git a/pages/interns/sevval_bulburu/README.md b/pages/interns/sevval_bulburu/README.md index f7720dec..b0e7a202 100644 --- a/pages/interns/sevval_bulburu/README.md +++ b/pages/interns/sevval_bulburu/README.md @@ -2,7 +2,7 @@ title: Sevval Bulburu published: true taxonomy: - category: [iaeste2023] + category: [iaeste] tag: [hatespeech,nlp] author: Daniel Hladek ---