forked from KEMT/zpwiki
zz
This commit is contained in:
parent
aa2e279ab0
commit
01077d3731
@ -1,10 +1,17 @@
|
|||||||
# Question Answering
|
# Question Answering
|
||||||
|
|
||||||
|
[Project repository](https://git.kemt.fei.tuke.sk/dano/annotation)
|
||||||
|
|
||||||
|
## Project Description
|
||||||
|
|
||||||
Task definition:
|
Task definition:
|
||||||
|
|
||||||
- Create a clone of SQuaD 2.0 in Slovak language
|
- Create a clone of [SQuaD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) in the Slovak language
|
||||||
- Setup annotation infrastructure
|
- Setup annotation infrastructure with [Prodigy](https://prodi.gy/)
|
||||||
- Perform and evaluate annotations
|
- Perform and evaluate annotations of [Wikipedia data](https://dumps.wikimedia.org/backup-index.html).
|
||||||
|
|
||||||
|
Auxiliary tasks:
|
||||||
|
|
||||||
- Consider using machine translation
|
- Consider using machine translation
|
||||||
- Train and evaluate Question Answering model
|
- Train and evaluate Question Answering model
|
||||||
|
|
||||||
@ -19,6 +26,15 @@ Output: a set of paragraphs
|
|||||||
1. Obtaining and parsing of wikipedia dump
|
1. Obtaining and parsing of wikipedia dump
|
||||||
1. Selecting feasible paragraphs
|
1. Selecting feasible paragraphs
|
||||||
|
|
||||||
|
Done:
|
||||||
|
|
||||||
|
- Wiki parsing script
|
||||||
|
- PageRank script
|
||||||
|
|
||||||
|
To be done:
|
||||||
|
|
||||||
|
- random selection of paragraphs: select all good paragraphs and shuffle
|
||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
- PageRank Causes bias to geography, random selection might be the best
|
- PageRank Causes bias to geography, random selection might be the best
|
||||||
@ -32,12 +48,32 @@ Input: A set of paragraphs
|
|||||||
|
|
||||||
Output: A question for each paragraph
|
Output: A question for each paragraph
|
||||||
|
|
||||||
|
Done:
|
||||||
|
|
||||||
|
- a data preparation script
|
||||||
|
- annotation running script
|
||||||
|
|
||||||
|
To be done:
|
||||||
|
|
||||||
|
- final input paragraphs
|
||||||
|
- deployment
|
||||||
|
|
||||||
### Answer Annotation
|
### Answer Annotation
|
||||||
|
|
||||||
Input: A set of paragraphs and questions
|
Input: A set of paragraphs and questions
|
||||||
|
|
||||||
Output: An answer for each paragraph and question
|
Output: An answer for each paragraph and question
|
||||||
|
|
||||||
|
Done:
|
||||||
|
|
||||||
|
- a data preparation script
|
||||||
|
- annotation running script
|
||||||
|
|
||||||
|
To be done:
|
||||||
|
|
||||||
|
- input paragraphs with questions
|
||||||
|
- deployment
|
||||||
|
|
||||||
### Annotation Summary
|
### Annotation Summary
|
||||||
|
|
||||||
Annotation work summary
|
Annotation work summary
|
||||||
@ -46,29 +82,41 @@ Input: Database of annotations
|
|||||||
|
|
||||||
Output: Summary of work performed by each annotator
|
Output: Summary of work performed by each annotator
|
||||||
|
|
||||||
|
To be done:
|
||||||
|
|
||||||
|
- web application for annotation analysis
|
||||||
|
- analyze sql schema and find out who annotated what
|
||||||
|
|
||||||
### Annotation Manual
|
### Annotation Manual
|
||||||
|
|
||||||
Output: Recommendations for annotators
|
Output: Recommendations for annotators
|
||||||
|
|
||||||
|
TBD
|
||||||
|
|
||||||
### Question Answering Model
|
### Question Answering Model
|
||||||
|
|
||||||
|
Training the model with annotated data
|
||||||
|
|
||||||
Input: An annotated QA database
|
Input: An annotated QA database
|
||||||
|
|
||||||
Otput: An evaluated model for QA
|
Output: An evaluated model for QA
|
||||||
|
|
||||||
Traing the model with annotated data:
|
To be done:
|
||||||
|
|
||||||
- Selecting existing modelling approach
|
- Selecting existing modelling approach
|
||||||
- Evaluation set selection
|
- Evaluation set selection
|
||||||
- Model evaluation
|
- Model evaluation
|
||||||
- Supporting the annotation with the model (pre-selecting answers)
|
- Supporting the annotation with the model (pre-selecting answers)
|
||||||
|
|
||||||
|
|
||||||
### Supporting activities
|
### Supporting activities
|
||||||
|
|
||||||
Output: More annotations
|
Output: More annotations
|
||||||
|
|
||||||
Organizing voluntary student challenges to support the annotation process
|
Organizing voluntary student challenges to support the annotation process
|
||||||
|
|
||||||
|
TBD
|
||||||
|
|
||||||
## Existing implementations
|
## Existing implementations
|
||||||
|
|
||||||
- https://github.com/facebookresearch/DrQA
|
- https://github.com/facebookresearch/DrQA
|
||||||
@ -87,3 +135,34 @@ Facebook Research
|
|||||||
- Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016)
|
- Squad TheStanfordQuestionAnsweringDataset(SQuAD) (Rajpurkar et al., 2016)
|
||||||
- WebQuestions
|
- WebQuestions
|
||||||
- https://en.wikipedia.org/wiki/Freebase
|
- https://en.wikipedia.org/wiki/Freebase
|
||||||
|
|
||||||
|
## Intern tasks
|
||||||
|
|
||||||
|
Week 1: Intro
|
||||||
|
|
||||||
|
- Get acquainted with the project and Squad Database
|
||||||
|
- Download the database and study the bibliography
|
||||||
|
|
||||||
|
Week 2 and 3: Web Application
|
||||||
|
|
||||||
|
- Analyze sql schema of Prodigy annotations
|
||||||
|
- Find out who annotated what.
|
||||||
|
- Make a web application that displays results.
|
||||||
|
- Extend the application to analyze more Prodigy instances (for both question and answer annotations)
|
||||||
|
- Improve the process of annotation.
|
||||||
|
|
||||||
|
Output: Web application (in Node.js or Python) and Dockerfile
|
||||||
|
|
||||||
|
Week 4-7 The model
|
||||||
|
|
||||||
|
Select and train a working question answering system
|
||||||
|
|
||||||
|
Output:
|
||||||
|
|
||||||
|
- a deployment script with comments for a selected question answering system
|
||||||
|
- a working training recipe (can use English ata), a script with comments or Jupyter Notebook
|
||||||
|
- a trained model
|
||||||
|
- evaluation of the model (if possible)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user