forked from KEMT/zpwiki
		
	zz
This commit is contained in:
		
							parent
							
								
									594ced8cdb
								
							
						
					
					
						commit
						aa2e279ab0
					
				@ -1,48 +1,89 @@
 | 
			
		||||
# Question Answering
 | 
			
		||||
 | 
			
		||||
## Implementácie
 | 
			
		||||
Task definition:
 | 
			
		||||
 | 
			
		||||
- Create a clone of SQuaD 2.0 in Slovak language
 | 
			
		||||
- Setup annotation infrastructure
 | 
			
		||||
- Perform and evaluate annotations
 | 
			
		||||
- Consider using machine translation 
 | 
			
		||||
- Train and evaluate Question Answering model
 | 
			
		||||
 | 
			
		||||
## Tasks
 | 
			
		||||
 | 
			
		||||
### Raw Data Preparation
 | 
			
		||||
 | 
			
		||||
Input: Wikipedia
 | 
			
		||||
 | 
			
		||||
Output: a set of paragraphs
 | 
			
		||||
 | 
			
		||||
1. Obtaining and parsing of wikipedia dump
 | 
			
		||||
1. Selecting feasible paragraphs
 | 
			
		||||
 | 
			
		||||
Notes:
 | 
			
		||||
 | 
			
		||||
- PageRank Causes bias to geography, random selection might be the best
 | 
			
		||||
- [75 best articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov)
 | 
			
		||||
- [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov)
 | 
			
		||||
- [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti)
 | 
			
		||||
 | 
			
		||||
### Question Annotation
 | 
			
		||||
 | 
			
		||||
Input: A set of paragraphs
 | 
			
		||||
 | 
			
		||||
Output: A question for each paragraph
 | 
			
		||||
 | 
			
		||||
### Answer Annotation
 | 
			
		||||
 | 
			
		||||
Input: A set of paragraphs and questions
 | 
			
		||||
 | 
			
		||||
Output: An answer for each paragraph and question
 | 
			
		||||
 | 
			
		||||
### Annotation Summary
 | 
			
		||||
 | 
			
		||||
Annotation work summary
 | 
			
		||||
 | 
			
		||||
Input: Database of annotations
 | 
			
		||||
 | 
			
		||||
Output: Summary of work performed by each annotator
 | 
			
		||||
 | 
			
		||||
### Annotation Manual
 | 
			
		||||
 | 
			
		||||
Output: Recommendations for annotators
 | 
			
		||||
 | 
			
		||||
### Question Answering Model
 | 
			
		||||
 | 
			
		||||
Input: An annotated QA database
 | 
			
		||||
 | 
			
		||||
Otput: An evaluated model for QA
 | 
			
		||||
 | 
			
		||||
Traing the model with annotated data:
 | 
			
		||||
 | 
			
		||||
- Selecting existing modelling approach
 | 
			
		||||
- Evaluation set selection
 | 
			
		||||
- Model evaluation
 | 
			
		||||
- Supporting the annotation with the model (pre-selecting answers)
 | 
			
		||||
 | 
			
		||||
### Supporting activities
 | 
			
		||||
 | 
			
		||||
Output: More annotations
 | 
			
		||||
 | 
			
		||||
Organizing voluntary student challenges to support the annotation process
 | 
			
		||||
 | 
			
		||||
## Existing implementations
 | 
			
		||||
 | 
			
		||||
- https://github.com/facebookresearch/DrQA
 | 
			
		||||
- https://github.com/brmson/yodaqa
 | 
			
		||||
- https://github.com/5hirish/adam_qas
 | 
			
		||||
- https://github.com/WDAqua/Qanary - metodológia a implementácia QA
 | 
			
		||||
 | 
			
		||||
## Bibliografia
 | 
			
		||||
## Bibligraphy
 | 
			
		||||
 | 
			
		||||
- Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
 | 
			
		||||
Facebook Research
 | 
			
		||||
- SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Dáta
 | 
			
		||||
## Existing Datasets
 | 
			
		||||
 | 
			
		||||
- Squad TheStanfordQuestionAnsweringDataset(SQuAD)  (Rajpurkar  et  al.,  2016) 
 | 
			
		||||
- WebQuestions
 | 
			
		||||
- https://en.wikipedia.org/wiki/Freebase
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Príprava dátovej množiny
 | 
			
		||||
 | 
			
		||||
1. Získanie a parsovanie Wiki Dump
 | 
			
		||||
2. Výber vhodných paragrafov (1. paragraf?)
 | 
			
		||||
 | 
			
		||||
Zoznam 75 najlepších článkov https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov
 | 
			
		||||
Zoznam 167 dobrých článkov
 | 
			
		||||
https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov 
 | 
			
		||||
Wikipedia: vedeli ste že? (facts) https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti
 | 
			
		||||
 | 
			
		||||
## Príprava crowdsourcing systému
 | 
			
		||||
 | 
			
		||||
? Bootstrapping slovenského Spacy Modelu
 | 
			
		||||
Deployment web aplikácie
 | 
			
		||||
Setup anotačnej úlohy 1, 2, 3
 | 
			
		||||
Databáza anotátorov pre evidenciu pracovných výstupov
 | 
			
		||||
Príprava manuálu pre anotátorov
 | 
			
		||||
 | 
			
		||||
Aplikácia pre vyhodnotenie výsledkov anotácie - kto anotoval koľko, koľko je anotované spolu
 | 
			
		||||
 | 
			
		||||
### Anotácia
 | 
			
		||||
 | 
			
		||||
Vytvorenie otázky k paragrafu
 | 
			
		||||
Vyznačenie odpovede na otázku v paragrafe
 | 
			
		||||
Vyznačenie pomenovaných entít?
 | 
			
		||||
 | 
			
		||||
		Loading…
	
		Reference in New Issue
	
	Block a user