# Question Answering Task definition: - Create a clone of SQuaD 2.0 in Slovak language - Setup annotation infrastructure - Perform and evaluate annotations - Consider using machine translation - Train and evaluate Question Answering model ## Tasks ### Raw Data Preparation Input: Wikipedia Output: a set of paragraphs 1. Obtaining and parsing of wikipedia dump 1. Selecting feasible paragraphs Notes: - PageRank Causes bias to geography, random selection might be the best - [75 best articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov) - [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov) - [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti) ### Question Annotation Input: A set of paragraphs Output: A question for each paragraph ### Answer Annotation Input: A set of paragraphs and questions Output: An answer for each paragraph and question ### Annotation Summary Annotation work summary Input: Database of annotations Output: Summary of work performed by each annotator ### Annotation Manual Output: Recommendations for annotators ### Question Answering Model Input: An annotated QA database Otput: An evaluated model for QA Traing the model with annotated data: - Selecting existing modelling approach - Evaluation set selection - Model evaluation - Supporting the annotation with the model (pre-selecting answers) ### Supporting activities Output: More annotations Organizing voluntary student challenges to support the annotation process ## Existing implementations - https://github.com/facebookresearch/DrQA - https://github.com/brmson/yodaqa - https://github.com/5hirish/adam_qas - https://github.com/WDAqua/Qanary - metodológia a implementácia QA ## Bibligraphy - Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes Facebook Research - SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250 ## Existing Datasets - Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016) - WebQuestions - https://en.wikipedia.org/wiki/Freebase