Go to file
2022-02-21 11:15:32 +01:00
data adding transformed file 2022-02-21 11:00:45 +01:00
notes spring cleaning 2022-02-20 21:32:55 +01:00
.env added initial script with squad loading, printing, saving and with example function for google translation, added markdown file with basic setup instructions 2021-11-04 21:07:23 +01:00
.gitignore small translation test working 2022-02-20 23:04:51 +01:00
README.md added readme file with instructions 2022-02-21 11:15:32 +01:00
requirements.txt small translation test working 2022-02-20 23:04:51 +01:00
squad_transform.py added tqdm for visualization, question translation, scripts for removing special chars and recalculating indexes, some variable renaming, etc. 2022-02-21 10:52:56 +01:00
squad_utils.py even more spring cleaning 2022-02-20 22:02:03 +01:00
translate_utils.py small translation test working 2022-02-20 23:04:51 +01:00

Tvorba korpusu otázok a odpovedí vo viacerých jazykoch pomocou strojového prekladu

  1. Vypracujte prehľad jazykových mutácii overovacej množiny SQUAD a opíšte spôsob ich tvorby.
  2. Vypracujte prehľad aktuálnych systémov pre generovanie odpovede na otázku v prirodzenom jazyku.
  3. Navrhnite a vykonajte postup pre vytvorenie korpusu otázok a odpovedí v inom jazyku pomocou strojového prekladu z anglického jazyka,
  4. Natrénujte systém pre generovanie odpovedí na viacerých jazykových verziách SQUAD a porovnajte ich presnosť.

Prerequisites

  1. Download and unpack google-cloud-sdk from here.
  2. Create account, project, service account and keys in google cloud following this documentation.
  3. Create a file named google_api_key.json in root directory and copy the contents of the downloaded service account keys file inside.
  4. Create a new conda environment and install required packages with pip install -r requirements.txt

Functionality

squad_transform.py

  1. Takes squad-v2-dev-small.json and adds special characters around every answer in context and calculates new indexes of the answer positions.
  2. Translates every context and every question and puts the translated sentences in new fields.
  3. Saves the intermediate file as squad-v2-dev-small-transformed.py
  4. Removes the special characters from contexts and recalculates the indexes of the answer positions.
  5. Saves the final file as squad-v2-dev-small-translated.py

squad_utils.py

Utility functions for working with squad files

translate_utils.py

Utility fuctions for working with Google's Translate API