This commit is contained in:
Daniel Hládek 2020-06-11 12:07:46 +02:00
parent 594ced8cdb
commit aa2e279ab0

View File

@ -1,48 +1,89 @@
# Question Answering # Question Answering
## Implementácie Task definition:
- Create a clone of SQuaD 2.0 in Slovak language
- Setup annotation infrastructure
- Perform and evaluate annotations
- Consider using machine translation
- Train and evaluate Question Answering model
## Tasks
### Raw Data Preparation
Input: Wikipedia
Output: a set of paragraphs
1. Obtaining and parsing of wikipedia dump
1. Selecting feasible paragraphs
Notes:
- PageRank Causes bias to geography, random selection might be the best
- [75 best articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov)
- [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov)
- [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti)
### Question Annotation
Input: A set of paragraphs
Output: A question for each paragraph
### Answer Annotation
Input: A set of paragraphs and questions
Output: An answer for each paragraph and question
### Annotation Summary
Annotation work summary
Input: Database of annotations
Output: Summary of work performed by each annotator
### Annotation Manual
Output: Recommendations for annotators
### Question Answering Model
Input: An annotated QA database
Otput: An evaluated model for QA
Traing the model with annotated data:
- Selecting existing modelling approach
- Evaluation set selection
- Model evaluation
- Supporting the annotation with the model (pre-selecting answers)
### Supporting activities
Output: More annotations
Organizing voluntary student challenges to support the annotation process
## Existing implementations
- https://github.com/facebookresearch/DrQA - https://github.com/facebookresearch/DrQA
- https://github.com/brmson/yodaqa - https://github.com/brmson/yodaqa
- https://github.com/5hirish/adam_qas - https://github.com/5hirish/adam_qas
- https://github.com/WDAqua/Qanary - metodológia a implementácia QA - https://github.com/WDAqua/Qanary - metodológia a implementácia QA
## Bibliografia ## Bibligraphy
- Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes - Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
Facebook Research Facebook Research
- SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250 - SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250
## Existing Datasets
## Dáta
- Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016) - Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016)
- WebQuestions - WebQuestions
- https://en.wikipedia.org/wiki/Freebase - https://en.wikipedia.org/wiki/Freebase
## Príprava dátovej množiny
1. Získanie a parsovanie Wiki Dump
2. Výber vhodných paragrafov (1. paragraf?)
Zoznam 75 najlepších článkov https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov
Zoznam 167 dobrých článkov
https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov
Wikipedia: vedeli ste že? (facts) https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti
## Príprava crowdsourcing systému
? Bootstrapping slovenského Spacy Modelu
Deployment web aplikácie
Setup anotačnej úlohy 1, 2, 3
Databáza anotátorov pre evidenciu pracovných výstupov
Príprava manuálu pre anotátorov
Aplikácia pre vyhodnotenie výsledkov anotácie - kto anotoval koľko, koľko je anotované spolu
### Anotácia
Vytvorenie otázky k paragrafu
Vyznačenie odpovede na otázku v paragrafe
Vyznačenie pomenovaných entít?