zpwiki/pages/interns/oliver_pejic/README.md

1.7 KiB

title published taxonomy
Oliver Pejic true
category tag author
iaeste
hatespeech
nlp
Daniel Hladek

Oliver Pejic

IAESTE Intern Summer 2024, 12 weeks in August, September and October.

Goal:

Final Tasks:

Meeting 3.9:

State: Studied MTEB framework and transformers.

Tasks:

  • Prepare and try MTEB evaluation tasks for the database. For evaluation you can try me5-base model.
  • Make a fork of MTEB and do necessary modification, including the documentation references for the task.
  • Prepare 2 GITHUB pull requests for the databases, preliminary BEIR script given.

Future tasks:

  • Prepare a machine translation system to create another slovak/multilingual evaluation task from English task.

Preparation (7.8.2024):

  • Get familiar with SentenceTransformer framework, study fundamental papers and write down notes.
  • Get familiar with MTEB evaluation framework.
  • Prepare a working environment on Google Colab or on school server or Anaconda.
  • Get familiar with existing finetuning scripts.