You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Tomas Kucharik adb8f32fb4 added readme file with instructions 2 years ago
data adding transformed file 2 years ago
notes spring cleaning 2 years ago
.env added initial script with squad loading, printing, saving and with example function for google translation, added markdown file with basic setup instructions 3 years ago
.gitignore small translation test working 2 years ago
README.md added readme file with instructions 2 years ago
requirements.txt small translation test working 2 years ago
squad_transform.py added tqdm for visualization, question translation, scripts for removing special chars and recalculating indexes, some variable renaming, etc. 2 years ago
squad_utils.py even more spring cleaning 2 years ago
translate_utils.py small translation test working 2 years ago

README.md

Tvorba korpusu otázok a odpovedí vo viacerých jazykoch pomocou strojového prekladu

  1. Vypracujte prehľad jazykových mutácii overovacej množiny SQUAD a opíšte spôsob ich tvorby.
  2. Vypracujte prehľad aktuálnych systémov pre generovanie odpovede na otázku v prirodzenom jazyku.
  3. Navrhnite a vykonajte postup pre vytvorenie korpusu otázok a odpovedí v inom jazyku pomocou strojového prekladu z anglického jazyka,
  4. Natrénujte systém pre generovanie odpovedí na viacerých jazykových verziách SQUAD a porovnajte ich presnosť.

Prerequisites

  1. Download and unpack google-cloud-sdk from here.
  2. Create account, project, service account and keys in google cloud following this documentation.
  3. Create a file named google_api_key.json in root directory and copy the contents of the downloaded service account keys file inside.
  4. Create a new conda environment and install required packages with pip install -r requirements.txt

Functionality

squad_transform.py

  1. Takes squad-v2-dev-small.json and adds special characters around every answer in context and calculates new indexes of the answer positions.
  2. Translates every context and every question and puts the translated sentences in new fields.
  3. Saves the intermediate file as squad-v2-dev-small-transformed.py
  4. Removes the special characters from contexts and recalculates the indexes of the answer positions.
  5. Saves the final file as squad-v2-dev-small-translated.py

squad_utils.py

Utility functions for working with squad files

translate_utils.py

Utility fuctions for working with Google's Translate API