dmytro_ushatenko/pages/topics/question/README.md
2020-06-11 12:07:46 +02:00

2.2 KiB

Question Answering

Task definition:

  • Create a clone of SQuaD 2.0 in Slovak language
  • Setup annotation infrastructure
  • Perform and evaluate annotations
  • Consider using machine translation
  • Train and evaluate Question Answering model

Tasks

Raw Data Preparation

Input: Wikipedia

Output: a set of paragraphs

  1. Obtaining and parsing of wikipedia dump
  2. Selecting feasible paragraphs

Notes:

Question Annotation

Input: A set of paragraphs

Output: A question for each paragraph

Answer Annotation

Input: A set of paragraphs and questions

Output: An answer for each paragraph and question

Annotation Summary

Annotation work summary

Input: Database of annotations

Output: Summary of work performed by each annotator

Annotation Manual

Output: Recommendations for annotators

Question Answering Model

Input: An annotated QA database

Otput: An evaluated model for QA

Traing the model with annotated data:

  • Selecting existing modelling approach
  • Evaluation set selection
  • Model evaluation
  • Supporting the annotation with the model (pre-selecting answers)

Supporting activities

Output: More annotations

Organizing voluntary student challenges to support the annotation process

Existing implementations

Bibligraphy

  • Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes Facebook Research
  • SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250

Existing Datasets