From 01077d37310f3befcf8b4d921f863777559b687c Mon Sep 17 00:00:00 2001
From: Daniel Hladek <daniel.hladek@tuke.sk>
Date: Thu, 11 Jun 2020 14:27:02 +0200
Subject: [PATCH] zz

---
 pages/topics/question/README.md | 89 +++++++++++++++++++++++++++++++--
 1 file changed, 84 insertions(+), 5 deletions(-)

diff --git a/pages/topics/question/README.md b/pages/topics/question/README.md
index 9e21dde55..93a1122ae 100644
--- a/pages/topics/question/README.md
+++ b/pages/topics/question/README.md
@@ -1,10 +1,17 @@
 # Question Answering
 
+[Project repository](https://git.kemt.fei.tuke.sk/dano/annotation)
+
+## Project Description
+
 Task definition:
 
-- Create a clone of SQuaD 2.0 in Slovak language
-- Setup annotation infrastructure
-- Perform and evaluate annotations
+- Create a clone of [SQuaD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) in the Slovak language
+- Setup annotation infrastructure with [Prodigy](https://prodi.gy/)
+- Perform and evaluate annotations of [Wikipedia data](https://dumps.wikimedia.org/backup-index.html).
+
+Auxiliary tasks:
+
 - Consider using machine translation 
 - Train and evaluate Question Answering model
 
@@ -19,6 +26,15 @@ Output: a set of paragraphs
 1. Obtaining and parsing of wikipedia dump
 1. Selecting feasible paragraphs
 
+Done:
+
+- Wiki parsing script
+- PageRank script
+
+To be done:
+
+- random selection of paragraphs: select all good paragraphs and shuffle
+
 Notes:
 
 - PageRank Causes bias to geography, random selection might be the best
@@ -32,12 +48,32 @@ Input: A set of paragraphs
 
 Output: A question for each paragraph
 
+Done: 
+
+- a data preparation script
+- annotation running script
+
+To be done:
+
+- final input paragraphs
+- deployment
+
 ### Answer Annotation
 
 Input: A set of paragraphs and questions
 
 Output: An answer for each paragraph and question
 
+Done: 
+
+- a data preparation script
+- annotation running script
+
+To be done:
+
+- input paragraphs with questions
+- deployment
+
 ### Annotation Summary
 
 Annotation work summary
@@ -46,29 +82,41 @@ Input: Database of annotations
 
 Output: Summary of work performed by each annotator
 
+To be done:
+
+- web application for annotation analysis
+- analyze sql schema and find out who annotated what
+
 ### Annotation Manual
 
 Output: Recommendations for annotators
 
+TBD
+
 ### Question Answering Model
 
+Training the model with annotated data
+
 Input: An annotated QA database
 
-Otput: An evaluated model for QA
+Output: An evaluated model for QA
 
-Traing the model with annotated data:
+To be done:
 
 - Selecting existing modelling approach
 - Evaluation set selection
 - Model evaluation
 - Supporting the annotation with the model (pre-selecting answers)
 
+
 ### Supporting activities
 
 Output: More annotations
 
 Organizing voluntary student challenges to support the annotation process
 
+TBD
+
 ## Existing implementations
 
 - https://github.com/facebookresearch/DrQA
@@ -87,3 +135,34 @@ Facebook Research
 - Squad TheStanfordQuestionAnsweringDataset(SQuAD)  (Rajpurkar  et  al.,  2016) 
 - WebQuestions
 - https://en.wikipedia.org/wiki/Freebase
+
+## Intern tasks
+
+Week 1: Intro
+
+- Get acquainted with the project and Squad Database
+- Download the database and study the bibliography
+
+Week 2 and 3: Web Application
+
+- Analyze sql schema of Prodigy annotations 
+- Find out who annotated what.
+- Make a web application that displays results.
+- Extend the application to analyze more Prodigy instances (for both question and answer annotations)
+- Improve the process of annotation.
+
+Output: Web application (in Node.js or Python) and Dockerfile
+
+Week 4-7 The model
+
+Select and train a working question answering system
+
+Output:
+
+- a deployment script with comments for a selected question answering system 
+- a working training recipe (can use English ata), a script with comments or Jupyter Notebook
+- a trained model
+- evaluation of the model (if possible)
+
+
+