zpwiki/pages/interns/oliver_pejic
..
README.md

title published taxonomy
Oliver Pejic true
category tag author
iaeste
hatespeech
nlp
Daniel Hladek

Oliver Pejic

IAESTE Intern Summer 2024, 12 weeks in August, September and October.

Goal:

Final Tasks:

Meeting 3.10.:

State:

  • Prepared a pull request for Retrieval SK Quad.
  • Prepared a pull request for Hate Speech Slovak.

Tasks:

  • Make the pull request compatible with the MTEB Contribution guidelines. Discuss it when it is done.
  • Submit pull requests to MTEB project.
  • Machine Translate a database (HotpotQA, DB Pedia, FEVER) . Pick a database that is short, because translation might be slow.

Non priority tasks:

  • Prepare databse and subnit it to HuggingFace Hub.
  • Prepare a MTEB PR for the databse.

Meeting 3.9:

State: Studied MTEB framework and transformers.

Tasks:

  • Prepare and try MTEB evaluation tasks for the database. For evaluation you can try me5-base model.
  • Make a fork of MTEB and do necessary modification, including the documentation references for the task.
  • Prepare 2 GITHUB pull requests for the databases, preliminary BEIR script given.

Future tasks:

  • Prepare a machine translation system to create another slovak/multilingual evaluation task from English task.

Preparation (7.8.2024):

  • Get familiar with SentenceTransformer framework, study fundamental papers and write down notes.
  • Get familiar with MTEB evaluation framework.
  • Prepare a working environment on Google Colab or on school server or Anaconda.
  • Get familiar with existing finetuning scripts.