zz

2020-06-11 14:27:02 +02:00 · 2020-06-11 14:27:02 +02:00 · 01077d3731
commit 01077d3731
parent aa2e279ab0
1 changed files with 84 additions and 5 deletions
--- a/pages/topics/question/README.md
+++ b/pages/topics/question/README.md
@ -1,10 +1,17 @@
 # Question Answering

+[Project repository](https://git.kemt.fei.tuke.sk/dano/annotation)
+
+## Project Description
+
 Task definition:

- Create a clone of SQuaD 2.0 in Slovak language
- Setup annotation infrastructure
- Perform and evaluate annotations
+- Create a clone of [SQuaD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) in the Slovak language
+- Setup annotation infrastructure with [Prodigy](https://prodi.gy/)
+- Perform and evaluate annotations of [Wikipedia data](https://dumps.wikimedia.org/backup-index.html).
+
+Auxiliary tasks:
+
 - Consider using machine translation 
 - Train and evaluate Question Answering model

@ -19,6 +26,15 @@ Output: a set of paragraphs
 1. Obtaining and parsing of wikipedia dump
 1. Selecting feasible paragraphs

+Done:
+
+- Wiki parsing script
+- PageRank script
+
+To be done:
+
+- random selection of paragraphs: select all good paragraphs and shuffle
+
 Notes:

 - PageRank Causes bias to geography, random selection might be the best
@ -32,12 +48,32 @@ Input: A set of paragraphs

 Output: A question for each paragraph

+Done: 
+
+- a data preparation script
+- annotation running script
+
+To be done:
+
+- final input paragraphs
+- deployment
+
 ### Answer Annotation

 Input: A set of paragraphs and questions

 Output: An answer for each paragraph and question

+Done: 
+
+- a data preparation script
+- annotation running script
+
+To be done:
+
+- input paragraphs with questions
+- deployment
+
 ### Annotation Summary

 Annotation work summary
@ -46,29 +82,41 @@ Input: Database of annotations

 Output: Summary of work performed by each annotator

+To be done:
+
+- web application for annotation analysis
+- analyze sql schema and find out who annotated what
+
 ### Annotation Manual

 Output: Recommendations for annotators

+TBD
+
 ### Question Answering Model

+Training the model with annotated data
+
 Input: An annotated QA database

-Otput: An evaluated model for QA
+Output: An evaluated model for QA

-Traing the model with annotated data:
+To be done:

 - Selecting existing modelling approach
 - Evaluation set selection
 - Model evaluation
 - Supporting the annotation with the model (pre-selecting answers)

+
 ### Supporting activities

 Output: More annotations

 Organizing voluntary student challenges to support the annotation process

+TBD
+
 ## Existing implementations

 - https://github.com/facebookresearch/DrQA
@ -87,3 +135,34 @@ Facebook Research
 - Squad TheStanfordQuestionAnsweringDataset(SQuAD)  (Rajpurkar  et  al.,  2016) 
 - WebQuestions
 - https://en.wikipedia.org/wiki/Freebase
+
+## Intern tasks
+
+Week 1: Intro
+
+- Get acquainted with the project and Squad Database
+- Download the database and study the bibliography
+
+Week 2 and 3: Web Application
+
+- Analyze sql schema of Prodigy annotations 
+- Find out who annotated what.
+- Make a web application that displays results.
+- Extend the application to analyze more Prodigy instances (for both question and answer annotations)
+- Improve the process of annotation.
+
+Output: Web application (in Node.js or Python) and Dockerfile
+
+Week 4-7 The model
+
+Select and train a working question answering system
+
+Output:
+
+- a deployment script with comments for a selected question answering system 
+- a working training recipe (can use English ata), a script with comments or Jupyter Notebook
+- a trained model
+- evaluation of the model (if possible)
+
+
+