From aa2e279ab0baa7e8bd53f0430a3bae9a1867c483 Mon Sep 17 00:00:00 2001 From: Daniel Hladek Date: Thu, 11 Jun 2020 12:07:46 +0200 Subject: [PATCH] zz --- pages/topics/question/README.md | 103 ++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 31 deletions(-) diff --git a/pages/topics/question/README.md b/pages/topics/question/README.md index 0b5b0a1d..9e21dde5 100644 --- a/pages/topics/question/README.md +++ b/pages/topics/question/README.md @@ -1,48 +1,89 @@ # Question Answering -## Implementácie +Task definition: + +- Create a clone of SQuaD 2.0 in Slovak language +- Setup annotation infrastructure +- Perform and evaluate annotations +- Consider using machine translation +- Train and evaluate Question Answering model + +## Tasks + +### Raw Data Preparation + +Input: Wikipedia + +Output: a set of paragraphs + +1. Obtaining and parsing of wikipedia dump +1. Selecting feasible paragraphs + +Notes: + +- PageRank Causes bias to geography, random selection might be the best +- [75 best articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov) +- [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov) +- [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti) + +### Question Annotation + +Input: A set of paragraphs + +Output: A question for each paragraph + +### Answer Annotation + +Input: A set of paragraphs and questions + +Output: An answer for each paragraph and question + +### Annotation Summary + +Annotation work summary + +Input: Database of annotations + +Output: Summary of work performed by each annotator + +### Annotation Manual + +Output: Recommendations for annotators + +### Question Answering Model + +Input: An annotated QA database + +Otput: An evaluated model for QA + +Traing the model with annotated data: + +- Selecting existing modelling approach +- Evaluation set selection +- Model evaluation +- Supporting the annotation with the model (pre-selecting answers) + +### Supporting activities + +Output: More annotations + +Organizing voluntary student challenges to support the annotation process + +## Existing implementations - https://github.com/facebookresearch/DrQA - https://github.com/brmson/yodaqa - https://github.com/5hirish/adam_qas - https://github.com/WDAqua/Qanary - metodológia a implementácia QA -## Bibliografia +## Bibligraphy - Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes Facebook Research - SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250 - -## Dáta +## Existing Datasets - Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016) - WebQuestions - https://en.wikipedia.org/wiki/Freebase - - -## Príprava dátovej množiny - -1. Získanie a parsovanie Wiki Dump -2. Výber vhodných paragrafov (1. paragraf?) - -Zoznam 75 najlepších článkov https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov -Zoznam 167 dobrých článkov -https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov -Wikipedia: vedeli ste že? (facts) https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti - -## Príprava crowdsourcing systému - -? Bootstrapping slovenského Spacy Modelu -Deployment web aplikácie -Setup anotačnej úlohy 1, 2, 3 -Databáza anotátorov pre evidenciu pracovných výstupov -Príprava manuálu pre anotátorov - -Aplikácia pre vyhodnotenie výsledkov anotácie - kto anotoval koľko, koľko je anotované spolu - -### Anotácia - -Vytvorenie otázky k paragrafu -Vyznačenie odpovede na otázku v paragrafe -Vyznačenie pomenovaných entít?