forked from KEMT/zpwiki
		
	zz
This commit is contained in:
		
							parent
							
								
									594ced8cdb
								
							
						
					
					
						commit
						aa2e279ab0
					
				| @ -1,48 +1,89 @@ | ||||
| # Question Answering | ||||
| 
 | ||||
| ## Implementácie | ||||
| Task definition: | ||||
| 
 | ||||
| - Create a clone of SQuaD 2.0 in Slovak language | ||||
| - Setup annotation infrastructure | ||||
| - Perform and evaluate annotations | ||||
| - Consider using machine translation  | ||||
| - Train and evaluate Question Answering model | ||||
| 
 | ||||
| ## Tasks | ||||
| 
 | ||||
| ### Raw Data Preparation | ||||
| 
 | ||||
| Input: Wikipedia | ||||
| 
 | ||||
| Output: a set of paragraphs | ||||
| 
 | ||||
| 1. Obtaining and parsing of wikipedia dump | ||||
| 1. Selecting feasible paragraphs | ||||
| 
 | ||||
| Notes: | ||||
| 
 | ||||
| - PageRank Causes bias to geography, random selection might be the best | ||||
| - [75 best articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov) | ||||
| - [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov) | ||||
| - [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti) | ||||
| 
 | ||||
| ### Question Annotation | ||||
| 
 | ||||
| Input: A set of paragraphs | ||||
| 
 | ||||
| Output: A question for each paragraph | ||||
| 
 | ||||
| ### Answer Annotation | ||||
| 
 | ||||
| Input: A set of paragraphs and questions | ||||
| 
 | ||||
| Output: An answer for each paragraph and question | ||||
| 
 | ||||
| ### Annotation Summary | ||||
| 
 | ||||
| Annotation work summary | ||||
| 
 | ||||
| Input: Database of annotations | ||||
| 
 | ||||
| Output: Summary of work performed by each annotator | ||||
| 
 | ||||
| ### Annotation Manual | ||||
| 
 | ||||
| Output: Recommendations for annotators | ||||
| 
 | ||||
| ### Question Answering Model | ||||
| 
 | ||||
| Input: An annotated QA database | ||||
| 
 | ||||
| Otput: An evaluated model for QA | ||||
| 
 | ||||
| Traing the model with annotated data: | ||||
| 
 | ||||
| - Selecting existing modelling approach | ||||
| - Evaluation set selection | ||||
| - Model evaluation | ||||
| - Supporting the annotation with the model (pre-selecting answers) | ||||
| 
 | ||||
| ### Supporting activities | ||||
| 
 | ||||
| Output: More annotations | ||||
| 
 | ||||
| Organizing voluntary student challenges to support the annotation process | ||||
| 
 | ||||
| ## Existing implementations | ||||
| 
 | ||||
| - https://github.com/facebookresearch/DrQA | ||||
| - https://github.com/brmson/yodaqa | ||||
| - https://github.com/5hirish/adam_qas | ||||
| - https://github.com/WDAqua/Qanary - metodológia a implementácia QA | ||||
| 
 | ||||
| ## Bibliografia | ||||
| ## Bibligraphy | ||||
| 
 | ||||
| - Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes | ||||
| Facebook Research | ||||
| - SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250 | ||||
| 
 | ||||
| 
 | ||||
| ## Dáta | ||||
| ## Existing Datasets | ||||
| 
 | ||||
| - Squad TheStanfordQuestionAnsweringDataset(SQuAD)  (Rajpurkar  et  al.,  2016)  | ||||
| - WebQuestions | ||||
| - https://en.wikipedia.org/wiki/Freebase | ||||
| 
 | ||||
| 
 | ||||
| ## Príprava dátovej množiny | ||||
| 
 | ||||
| 1. Získanie a parsovanie Wiki Dump | ||||
| 2. Výber vhodných paragrafov (1. paragraf?) | ||||
| 
 | ||||
| Zoznam 75 najlepších článkov https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_najlep%C5%A1%C3%ADch_%C4%8Dl%C3%A1nkov | ||||
| Zoznam 167 dobrých článkov | ||||
| https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov  | ||||
| Wikipedia: vedeli ste že? (facts) https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti | ||||
| 
 | ||||
| ## Príprava crowdsourcing systému | ||||
| 
 | ||||
| ? Bootstrapping slovenského Spacy Modelu | ||||
| Deployment web aplikácie | ||||
| Setup anotačnej úlohy 1, 2, 3 | ||||
| Databáza anotátorov pre evidenciu pracovných výstupov | ||||
| Príprava manuálu pre anotátorov | ||||
| 
 | ||||
| Aplikácia pre vyhodnotenie výsledkov anotácie - kto anotoval koľko, koľko je anotované spolu | ||||
| 
 | ||||
| ### Anotácia | ||||
| 
 | ||||
| Vytvorenie otázky k paragrafu | ||||
| Vyznačenie odpovede na otázku v paragrafe | ||||
| Vyznačenie pomenovaných entít? | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user