forked from KEMT/zpwiki
zz
This commit is contained in:
parent
594ced8cdb
commit
aa2e279ab0
@ -1,48 +1,89 @@
|
|||||||
# Question Answering
|
# Question Answering
|
||||||
|
|
||||||
## Implementácie
|
Task definition:
|
||||||
|
|
||||||
|
- Create a clone of SQuaD 2.0 in Slovak language
|
||||||
|
- Setup annotation infrastructure
|
||||||
|
- Perform and evaluate annotations
|
||||||
|
- Consider using machine translation
|
||||||
|
- Train and evaluate Question Answering model
|
||||||
|
|
||||||
|
## Tasks
|
||||||
|
|
||||||
|
### Raw Data Preparation
|
||||||
|
|
||||||
|
Input: Wikipedia
|
||||||
|
|
||||||
|
Output: a set of paragraphs
|
||||||
|
|
||||||
|
1. Obtaining and parsing of wikipedia dump
|
||||||
|
1. Selecting feasible paragraphs
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
- PageRank Causes bias to geography, random selection might be the best
|
||||||
|
- [75 best articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov)
|
||||||
|
- [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov)
|
||||||
|
- [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti)
|
||||||
|
|
||||||
|
### Question Annotation
|
||||||
|
|
||||||
|
Input: A set of paragraphs
|
||||||
|
|
||||||
|
Output: A question for each paragraph
|
||||||
|
|
||||||
|
### Answer Annotation
|
||||||
|
|
||||||
|
Input: A set of paragraphs and questions
|
||||||
|
|
||||||
|
Output: An answer for each paragraph and question
|
||||||
|
|
||||||
|
### Annotation Summary
|
||||||
|
|
||||||
|
Annotation work summary
|
||||||
|
|
||||||
|
Input: Database of annotations
|
||||||
|
|
||||||
|
Output: Summary of work performed by each annotator
|
||||||
|
|
||||||
|
### Annotation Manual
|
||||||
|
|
||||||
|
Output: Recommendations for annotators
|
||||||
|
|
||||||
|
### Question Answering Model
|
||||||
|
|
||||||
|
Input: An annotated QA database
|
||||||
|
|
||||||
|
Otput: An evaluated model for QA
|
||||||
|
|
||||||
|
Traing the model with annotated data:
|
||||||
|
|
||||||
|
- Selecting existing modelling approach
|
||||||
|
- Evaluation set selection
|
||||||
|
- Model evaluation
|
||||||
|
- Supporting the annotation with the model (pre-selecting answers)
|
||||||
|
|
||||||
|
### Supporting activities
|
||||||
|
|
||||||
|
Output: More annotations
|
||||||
|
|
||||||
|
Organizing voluntary student challenges to support the annotation process
|
||||||
|
|
||||||
|
## Existing implementations
|
||||||
|
|
||||||
- https://github.com/facebookresearch/DrQA
|
- https://github.com/facebookresearch/DrQA
|
||||||
- https://github.com/brmson/yodaqa
|
- https://github.com/brmson/yodaqa
|
||||||
- https://github.com/5hirish/adam_qas
|
- https://github.com/5hirish/adam_qas
|
||||||
- https://github.com/WDAqua/Qanary - metodológia a implementácia QA
|
- https://github.com/WDAqua/Qanary - metodológia a implementácia QA
|
||||||
|
|
||||||
## Bibliografia
|
## Bibligraphy
|
||||||
|
|
||||||
- Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
|
- Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
|
||||||
Facebook Research
|
Facebook Research
|
||||||
- SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250
|
- SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250
|
||||||
|
|
||||||
|
## Existing Datasets
|
||||||
## Dáta
|
|
||||||
|
|
||||||
- Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016)
|
- Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016)
|
||||||
- WebQuestions
|
- WebQuestions
|
||||||
- https://en.wikipedia.org/wiki/Freebase
|
- https://en.wikipedia.org/wiki/Freebase
|
||||||
|
|
||||||
|
|
||||||
## Príprava dátovej množiny
|
|
||||||
|
|
||||||
1. Získanie a parsovanie Wiki Dump
|
|
||||||
2. Výber vhodných paragrafov (1. paragraf?)
|
|
||||||
|
|
||||||
Zoznam 75 najlepších článkov https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov
|
|
||||||
Zoznam 167 dobrých článkov
|
|
||||||
https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov
|
|
||||||
Wikipedia: vedeli ste že? (facts) https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti
|
|
||||||
|
|
||||||
## Príprava crowdsourcing systému
|
|
||||||
|
|
||||||
? Bootstrapping slovenského Spacy Modelu
|
|
||||||
Deployment web aplikácie
|
|
||||||
Setup anotačnej úlohy 1, 2, 3
|
|
||||||
Databáza anotátorov pre evidenciu pracovných výstupov
|
|
||||||
Príprava manuálu pre anotátorov
|
|
||||||
|
|
||||||
Aplikácia pre vyhodnotenie výsledkov anotácie - kto anotoval koľko, koľko je anotované spolu
|
|
||||||
|
|
||||||
### Anotácia
|
|
||||||
|
|
||||||
Vytvorenie otázky k paragrafu
|
|
||||||
Vyznačenie odpovede na otázku v paragrafe
|
|
||||||
Vyznačenie pomenovaných entít?
|
|
||||||
|
Loading…
Reference in New Issue
Block a user