zz

2020-06-11 14:27:02 +02:00 · 2020-06-11 14:27:02 +02:00 · 01077d3731
commit 01077d3731
parent aa2e279ab0
1 changed files with 84 additions and 5 deletions
--- a/pages/topics/question/README.md
+++ b/pages/topics/question/README.md
@ -1,10 +1,17 @@
 # Question Answering
 [Project repository](https://git.kemt.fei.tuke.sk/dano/annotation)
 ## Project Description
 Task definition:
- Create a clone of SQuaD 2.0 in Slovak language
+- Create a clone of [SQuaD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) in the Slovak language
- Setup annotation infrastructure
+- Setup annotation infrastructure with [Prodigy](https://prodi.gy/)
- Perform and evaluate annotations
+- Perform and evaluate annotations of [Wikipedia data](https://dumps.wikimedia.org/backup-index.html).
 Auxiliary tasks:
 - Consider using machine translation 
 - Train and evaluate Question Answering model
@ -19,6 +26,15 @@ Output: a set of paragraphs
 1. Obtaining and parsing of wikipedia dump
 1. Selecting feasible paragraphs
 Done:
 - Wiki parsing script
 - PageRank script
 To be done:
 - random selection of paragraphs: select all good paragraphs and shuffle
 Notes:
 - PageRank Causes bias to geography, random selection might be the best
@ -32,12 +48,32 @@ Input: A set of paragraphs
 Output: A question for each paragraph
 Done: 
 - a data preparation script
 - annotation running script
 To be done:
 - final input paragraphs
 - deployment
 ### Answer Annotation
 Input: A set of paragraphs and questions
 Output: An answer for each paragraph and question
 Done: 
 - a data preparation script
 - annotation running script
 To be done:
 - input paragraphs with questions
 - deployment
 ### Annotation Summary
 Annotation work summary
@ -46,29 +82,41 @@ Input: Database of annotations
 Output: Summary of work performed by each annotator
 To be done:
 - web application for annotation analysis
 - analyze sql schema and find out who annotated what
 ### Annotation Manual
 Output: Recommendations for annotators
 TBD
 ### Question Answering Model
 Training the model with annotated data
 Input: An annotated QA database
-Otput: An evaluated model for QA
+Output: An evaluated model for QA
-Traing the model with annotated data:
+To be done:
 - Selecting existing modelling approach
 - Evaluation set selection
 - Model evaluation
 - Supporting the annotation with the model (pre-selecting answers)
 ### Supporting activities
 Output: More annotations
 Organizing voluntary student challenges to support the annotation process
 TBD
 ## Existing implementations
 - https://github.com/facebookresearch/DrQA
@ -87,3 +135,34 @@ Facebook Research
 - Squad TheStanfordQuestionAnsweringDataset(SQuAD)  (Rajpurkar  et  al.,  2016) 
 - WebQuestions
 - https://en.wikipedia.org/wiki/Freebase
 ## Intern tasks
 Week 1: Intro
 - Get acquainted with the project and Squad Database
 - Download the database and study the bibliography
 Week 2 and 3: Web Application
 - Analyze sql schema of Prodigy annotations 
 - Find out who annotated what.
 - Make a web application that displays results.
 - Extend the application to analyze more Prodigy instances (for both question and answer annotations)
 - Improve the process of annotation.
 Output: Web application (in Node.js or Python) and Dockerfile
 Week 4-7 The model
 Select and train a working question answering system
 Output:
 - a deployment script with comments for a selected question answering system 
 - a working training recipe (can use English ata), a script with comments or Jupyter Notebook
 - a trained model
 - evaluation of the model (if possible)