From adb8f32fb4f3761ea082978e96184775ce3aca5e Mon Sep 17 00:00:00 2001 From: Tomas Kucharik Date: Mon, 21 Feb 2022 11:15:32 +0100 Subject: [PATCH] added readme file with instructions --- README.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..8938d09 --- /dev/null +++ b/README.md @@ -0,0 +1,27 @@ +# Tvorba korpusu otázok a odpovedí vo viacerých jazykoch pomocou strojového prekladu + +1. Vypracujte prehľad jazykových mutácii overovacej množiny SQUAD a opíšte spôsob ich tvorby. +2. Vypracujte prehľad aktuálnych systémov pre generovanie odpovede na otázku v prirodzenom jazyku. +3. Navrhnite a vykonajte postup pre vytvorenie korpusu otázok a odpovedí v inom jazyku pomocou strojového prekladu z anglického jazyka, +4. Natrénujte systém pre generovanie odpovedí na viacerých jazykových verziách SQUAD a porovnajte ich presnosť. + +## Prerequisites + +1. Download and unpack `google-cloud-sdk` from [here](https://cloud.google.com/sdk/docs/install). +2. Create account, project, service account and keys in google cloud following [this documentation](https://cloud.google.com/translate/docs/setup). +3. Create a file named `google_api_key.json` in root directory and copy the contents of the downloaded service account keys file inside. +4. Create a new conda environment and install required packages with `pip install -r requirements.txt` + +## Functionality +### `squad_transform.py` +1. Takes `squad-v2-dev-small.json` and adds special characters around every answer in context and calculates new indexes of the answer positions. +2. Translates every context and every question and puts the translated sentences in new fields. +4. Saves the intermediate file as `squad-v2-dev-small-transformed.py` +5. Removes the special characters from contexts and recalculates the indexes of the answer positions. +6. Saves the final file as `squad-v2-dev-small-translated.py` + +### `squad_utils.py` +Utility functions for working with squad files + +### `translate_utils.py` +Utility fuctions for working with Google's Translate API