forked from KEMT/zpwiki
		
	zz
This commit is contained in:
		
							parent
							
								
									aa2e279ab0
								
							
						
					
					
						commit
						01077d3731
					
				@ -1,10 +1,17 @@
 | 
			
		||||
# Question Answering
 | 
			
		||||
 | 
			
		||||
[Project repository](https://git.kemt.fei.tuke.sk/dano/annotation)
 | 
			
		||||
 | 
			
		||||
## Project Description
 | 
			
		||||
 | 
			
		||||
Task definition:
 | 
			
		||||
 | 
			
		||||
- Create a clone of SQuaD 2.0 in Slovak language
 | 
			
		||||
- Setup annotation infrastructure
 | 
			
		||||
- Perform and evaluate annotations
 | 
			
		||||
- Create a clone of [SQuaD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) in the Slovak language
 | 
			
		||||
- Setup annotation infrastructure with [Prodigy](https://prodi.gy/)
 | 
			
		||||
- Perform and evaluate annotations of [Wikipedia data](https://dumps.wikimedia.org/backup-index.html).
 | 
			
		||||
 | 
			
		||||
Auxiliary tasks:
 | 
			
		||||
 | 
			
		||||
- Consider using machine translation 
 | 
			
		||||
- Train and evaluate Question Answering model
 | 
			
		||||
 | 
			
		||||
@ -19,6 +26,15 @@ Output: a set of paragraphs
 | 
			
		||||
1. Obtaining and parsing of wikipedia dump
 | 
			
		||||
1. Selecting feasible paragraphs
 | 
			
		||||
 | 
			
		||||
Done:
 | 
			
		||||
 | 
			
		||||
- Wiki parsing script
 | 
			
		||||
- PageRank script
 | 
			
		||||
 | 
			
		||||
To be done:
 | 
			
		||||
 | 
			
		||||
- random selection of paragraphs: select all good paragraphs and shuffle
 | 
			
		||||
 | 
			
		||||
Notes:
 | 
			
		||||
 | 
			
		||||
- PageRank Causes bias to geography, random selection might be the best
 | 
			
		||||
@ -32,12 +48,32 @@ Input: A set of paragraphs
 | 
			
		||||
 | 
			
		||||
Output: A question for each paragraph
 | 
			
		||||
 | 
			
		||||
Done: 
 | 
			
		||||
 | 
			
		||||
- a data preparation script
 | 
			
		||||
- annotation running script
 | 
			
		||||
 | 
			
		||||
To be done:
 | 
			
		||||
 | 
			
		||||
- final input paragraphs
 | 
			
		||||
- deployment
 | 
			
		||||
 | 
			
		||||
### Answer Annotation
 | 
			
		||||
 | 
			
		||||
Input: A set of paragraphs and questions
 | 
			
		||||
 | 
			
		||||
Output: An answer for each paragraph and question
 | 
			
		||||
 | 
			
		||||
Done: 
 | 
			
		||||
 | 
			
		||||
- a data preparation script
 | 
			
		||||
- annotation running script
 | 
			
		||||
 | 
			
		||||
To be done:
 | 
			
		||||
 | 
			
		||||
- input paragraphs with questions
 | 
			
		||||
- deployment
 | 
			
		||||
 | 
			
		||||
### Annotation Summary
 | 
			
		||||
 | 
			
		||||
Annotation work summary
 | 
			
		||||
@ -46,29 +82,41 @@ Input: Database of annotations
 | 
			
		||||
 | 
			
		||||
Output: Summary of work performed by each annotator
 | 
			
		||||
 | 
			
		||||
To be done:
 | 
			
		||||
 | 
			
		||||
- web application for annotation analysis
 | 
			
		||||
- analyze sql schema and find out who annotated what
 | 
			
		||||
 | 
			
		||||
### Annotation Manual
 | 
			
		||||
 | 
			
		||||
Output: Recommendations for annotators
 | 
			
		||||
 | 
			
		||||
TBD
 | 
			
		||||
 | 
			
		||||
### Question Answering Model
 | 
			
		||||
 | 
			
		||||
Training the model with annotated data
 | 
			
		||||
 | 
			
		||||
Input: An annotated QA database
 | 
			
		||||
 | 
			
		||||
Otput: An evaluated model for QA
 | 
			
		||||
Output: An evaluated model for QA
 | 
			
		||||
 | 
			
		||||
Traing the model with annotated data:
 | 
			
		||||
To be done:
 | 
			
		||||
 | 
			
		||||
- Selecting existing modelling approach
 | 
			
		||||
- Evaluation set selection
 | 
			
		||||
- Model evaluation
 | 
			
		||||
- Supporting the annotation with the model (pre-selecting answers)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Supporting activities
 | 
			
		||||
 | 
			
		||||
Output: More annotations
 | 
			
		||||
 | 
			
		||||
Organizing voluntary student challenges to support the annotation process
 | 
			
		||||
 | 
			
		||||
TBD
 | 
			
		||||
 | 
			
		||||
## Existing implementations
 | 
			
		||||
 | 
			
		||||
- https://github.com/facebookresearch/DrQA
 | 
			
		||||
@ -87,3 +135,34 @@ Facebook Research
 | 
			
		||||
- Squad TheStanfordQuestionAnsweringDataset(SQuAD)  (Rajpurkar  et  al.,  2016) 
 | 
			
		||||
- WebQuestions
 | 
			
		||||
- https://en.wikipedia.org/wiki/Freebase
 | 
			
		||||
 | 
			
		||||
## Intern tasks
 | 
			
		||||
 | 
			
		||||
Week 1: Intro
 | 
			
		||||
 | 
			
		||||
- Get acquainted with the project and Squad Database
 | 
			
		||||
- Download the database and study the bibliography
 | 
			
		||||
 | 
			
		||||
Week 2 and 3: Web Application
 | 
			
		||||
 | 
			
		||||
- Analyze sql schema of Prodigy annotations 
 | 
			
		||||
- Find out who annotated what.
 | 
			
		||||
- Make a web application that displays results.
 | 
			
		||||
- Extend the application to analyze more Prodigy instances (for both question and answer annotations)
 | 
			
		||||
- Improve the process of annotation.
 | 
			
		||||
 | 
			
		||||
Output: Web application (in Node.js or Python) and Dockerfile
 | 
			
		||||
 | 
			
		||||
Week 4-7 The model
 | 
			
		||||
 | 
			
		||||
Select and train a working question answering system
 | 
			
		||||
 | 
			
		||||
Output:
 | 
			
		||||
 | 
			
		||||
- a deployment script with comments for a selected question answering system 
 | 
			
		||||
- a working training recipe (can use English ata), a script with comments or Jupyter Notebook
 | 
			
		||||
- a trained model
 | 
			
		||||
- evaluation of the model (if possible)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
		Loading…
	
		Reference in New Issue
	
	Block a user