zpwiki/pages/interns/oliver_pejic
2024-10-03 07:48:58 +00:00
..
README.md Update pages/interns/oliver_pejic/README.md 2024-10-03 07:48:58 +00:00

title published taxonomy
Oliver Pejic true
category tag author
iaeste
hatespeech
nlp
Daniel Hladek

Oliver Pejic

IAESTE Intern Summer 2024, 12 weeks in August, September and October.

Goal:

Final Tasks:

Meeting 3.10.:

State:

  • Prepared a pull request for Retrieval SK Quad.
  • Prepared a pull request for Hate Speech Slovak.

Tasks:

  • Make the pull request compatible with the MTEB Contribution guidelines. Discuss it when it is done.
  • Submit pull requests to MTEB project.
  • Machine Translate a database (HotpotQA, DB Pedia, FEVER) . Pick a database that is short, because translation might be slow.

Non priority tasks:

  • Prepare databse and subnit it to HuggingFace Hub.
  • Prepare a MTEB PR for the databse.

Meeting 3.9:

State: Studied MTEB framework and transformers.

Tasks:

  • Prepare and try MTEB evaluation tasks for the database. For evaluation you can try me5-base model.
  • Make a fork of MTEB and do necessary modification, including the documentation references for the task.
  • Prepare 2 GITHUB pull requests for the databases, preliminary BEIR script given.

Future tasks:

  • Prepare a machine translation system to create another slovak/multilingual evaluation task from English task.

Preparation (7.8.2024):

  • Get familiar with SentenceTransformer framework, study fundamental papers and write down notes.
  • Get familiar with MTEB evaluation framework.
  • Prepare a working environment on Google Colab or on school server or Anaconda.
  • Get familiar with existing finetuning scripts.