From 0868ef694e978830a4fc7edb43075618247fcb66 Mon Sep 17 00:00:00 2001 From: dano Date: Fri, 3 Sep 2021 06:52:26 +0000 Subject: [PATCH] Update 'pages/topics/question/README.md' --- pages/topics/question/README.md | 102 ++++++++++++++++++-------------- 1 file changed, 58 insertions(+), 44 deletions(-) diff --git a/pages/topics/question/README.md b/pages/topics/question/README.md index 024d798a69..edcccc1466 100644 --- a/pages/topics/question/README.md +++ b/pages/topics/question/README.md @@ -11,6 +11,7 @@ taxonomy: - [Project repository](https://git.kemt.fei.tuke.sk/dano/annotation) (private) - [Annotation Manual for question annotation](navod) - [Annotation Manual for validations](validacie) +- [Annotation Manual for unanswerable questions](nezodpovedatelne) - [Summary database application](https://app.question.tukekemt,xyz) @@ -63,6 +64,19 @@ Notes: - [167 good articles](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zoznam_dobr%C3%BDch_%C4%8Dl%C3%A1nkov) - [Wiki Facts](https://sk.wikipedia.org/wiki/Wikip%C3%A9dia:Zauj%C3%ADmavosti) +## Finished Tasks + +### Annotation Manual + +Output: Recommendations for annotators + +Done: + +- Web Page for annotators (Daniel Hládek) +- Modivation video (Daniel Hládek) +- Video with instructions (Daniel Hládek) +bn application? + ### Question Annotation An annotation recipe for Prodigy @@ -79,15 +93,6 @@ Done: - answer annotation together with question (Daniel Hládek) - prepare final input paragraphs (dataset) -In progress: - -- More annotations (volunteers and workers). - -To be done: - -- Prepare development set - - ### Annotation Web Application Annotation work summary, web applicatiobn @@ -104,11 +109,6 @@ Done: - application deployment (Daniel Hládek) - extract annotations from question annotation in squad format (Daniel Hladek) - -To be done: - -- review of validations - ### Annotation Validation Input: annnotated questions and paragraph @@ -120,60 +120,53 @@ Done: - Recipe for validations (binary annotation for paragraphs, question and answers, text fields for correction of question and answer). (Daniel Hládek) - Deployment -To be done: -- Prepare for production +## Tasks in progress -### Annotation Manual +### Unanswerable question annotation -Output: Recommendations for annotators +Input: validated questions and answers + +Output: Unanswerable questions and answers Done: -- Web Page for annotators (Daniel Hládek) -- Modivation video (Daniel Hládek) -- Video with instructions (Daniel Hládek) +- Annotation manual +- Annotation interface +- Database schema modifications +- Modification of the database application +- Export of validations In progress: -- Should be instructions a part of the annotation webn application? +- Annotaion process optimization -### Question Answering Model +### Final Data Export -Training the model with annotated data +Input: Validations and unanswerable questions -Input: An annotated QA database +Output: Final database in SQUAD format -Output: An evaluated model for QA +Done: + +- Preliminary export script To be done: -- Selecting existing modelling approach -- Evaluation set selection -- Model evaluation -- Supporting the annotation with the model (pre-selecting answers) +- Final export script +- Database web visualization +- Prepare development set -In progress: +## Resources -- Preliminary model (Ján Staš and Matej Čarňanský) - - - -## Existing implementations - -- https://github.com/facebookresearch/DrQA -- https://github.com/brmson/yodaqa -- https://github.com/5hirish/adam_qas -- https://github.com/WDAqua/Qanary - metodológia a implementácia QA - -## Bibligraphy +### Bibligraphy - Reading Wikipedia to Answer Open-Domain Questions, Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes Facebook Research - SQuAD: 100,000+ Questions for Machine Comprehension of Text https://arxiv.org/abs/1606.05250 - [WDaqua](https://wdaqua.eu/our-work/) publications -## Existing Datasets +### Existing Datasets - [Squad](https://rajpurkar.github.io/SQuAD-explorer/) The Stanford Question Answering Dataset(SQuAD) (Rajpurkar et al., 2016) - [WebQuestions](https://github.com/brmson/dataset-factoid-webquestions) @@ -210,3 +203,24 @@ Output: - a trained model - evaluation of the model (if possible) + +### Question Answering Model + +Training the model with annotated data + +Input: An annotated QA database + +Output: An evaluated model for QA + +To be done: + +- Selecting existing modelling approach +- Evaluation set selection +- Model evaluation +- Supporting the annotation with the model (pre-selecting answers) + +In progress: + +- Preliminary model (Ján Staš and Matej Čarňanský) + +