du707zr/dmytro_ushatenko

Fork 0

forked from KEMT/zpwiki

dano fde9baf901 Update 'pages/topics/question/README.md'

2021-01-21 15:17:00 +00:00

5.2 KiB

Raw Blame History

title

published

taxonomy

Question Answering

true

Question Answering

Project repository (private)

Project Description

Create a clone of SQuaD 2.0 in the Slovak language
Setup annotation infrastructure with Prodigy
Perform and evaluate annotations of Wikipedia data.

Auxiliary tasks:

Consider using machine translation
Train and evaluate Question Answering model

Tasks

Raw Data Preparation

Input: Wikipedia

Output: a set of paragraphs

Obtaining and parsing of wikipedia dump
Selecting feasible paragraphs

Done:

Wiki parsing script (Daniel Hládek)
PageRank script (Daniel Hládek)
selection of paragraphs: select all good paragraphs and shuffle

To be done:

fix minor errors

Notes:

PageRank Causes bias to geography, random selection might be the best
75 best articles
167 good articles
Wiki Facts

Question Annotation

An annotation recipe for Prodigy

Input: A set of paragraphs

Output: A question for each paragraph

Done:

a data preparation script (Daniel Hládek)
annotation recipe (Daniel Hládek)
deployment at question.tukekemt.xyz (only from tuke) (Daniel Hládek)
answer annotation together with question (Daniel Hládek)
prepare final input paragraphs (dataset)

Answer Annotation

Input: A set of paragraphs and questions

Output: An answer for each paragraph and question

Done:

a data preparation script (Daniel Hládek)
annotation recipe (Daniel Hládek)
deployment at answer.tukekemt.xyz (only from tuke) (Daniel Hládek)
extract annotations from question annotation in squad format

Annotation Summary

Annotation work summary, web applicatiobn

Input: Database of annotations

Output: Summary of work performed by each annotator

Done:

application template (Tomáš Kuchárik)
Dockerfile (Daniel Hládek)
web application for annotation analysis (Tomáš Kuchárik, Daniel Hládek)
application deployment (Daniel Hládek)

Annotation Validation

Input: annnotated questions and paragraph

Output: good annotated questions

In Progress:

Design validation recipe (Tomáš Kuchárik)

To do:

Implement and deploy validation recipe (Tomáš Kuchárik)

Annotation Manual

Output: Recommendations for annotators

Done:

Web Page for annotators (Daniel Hládek)
Modivation video (Daniel Hládek)
Video with instructions (Daniel Hládek)