Question Answering

Task definition:

Create a clone of SQuaD 2.0 in Slovak language
Setup annotation infrastructure
Perform and evaluate annotations
Consider using machine translation
Train and evaluate Question Answering model

Tasks

Raw Data Preparation

Input: Wikipedia

Output: a set of paragraphs

Obtaining and parsing of wikipedia dump
Selecting feasible paragraphs

Notes:

PageRank Causes bias to geography, random selection might be the best
75 best articles
167 good articles
Wiki Facts

Question Annotation

Input: A set of paragraphs

Output: A question for each paragraph

Answer Annotation

Input: A set of paragraphs and questions

Output: An answer for each paragraph and question

Annotation Summary

Annotation work summary

Input: Database of annotations

Output: Summary of work performed by each annotator

Annotation Manual

Output: Recommendations for annotators

Question Answering Model

Input: An annotated QA database

Otput: An evaluated model for QA

Traing the model with annotated data:

Selecting existing modelling approach
Evaluation set selection
Model evaluation
Supporting the annotation with the model (pre-selecting answers)

Supporting activities

Output: More annotations

Organizing voluntary student challenges to support the annotation process

Existing implementations

https://github.com/facebookresearch/DrQA
https://github.com/brmson/yodaqa
https://github.com/5hirish/adam_qas
https://github.com/WDAqua/Qanary - metodológia a implementácia QA

Bibligraphy

Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes Facebook Research
SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250

Existing Datasets

Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016)
WebQuestions
https://en.wikipedia.org/wiki/Freebase

2.2 KiB Raw Blame History