lp832ut/zpwiki

Fork 0

forked from KEMT/zpwiki

dano 904214f4d4 Update 'pages/topics/question/README.md'

2020-07-03 04:23:27 +00:00

4.3 KiB

Raw Blame History

Question Answering

Project repository (private)

Project Description

Create a clone of SQuaD 2.0 in the Slovak language
Setup annotation infrastructure with Prodigy
Perform and evaluate annotations of Wikipedia data.

Auxiliary tasks:

Consider using machine translation
Train and evaluate Question Answering model

Tasks

Raw Data Preparation

Input: Wikipedia

Output: a set of paragraphs

Obtaining and parsing of wikipedia dump
Selecting feasible paragraphs

Done:

Wiki parsing script
PageRank script

To be done:

random selection of paragraphs: select all good paragraphs and shuffle

Notes:

PageRank Causes bias to geography, random selection might be the best
75 best articles
167 good articles
Wiki Facts

Question Annotation

An annotation recipe for Prodigy

Input: A set of paragraphs

Output: A question for each paragraph

Done:

a data preparation script
annotation recipe
deployment at question.tukekemt.xyz (only from tuke)
answer annotation together with question

To be done:

prepare final input paragraphs (dataset)

Answer Annotation

Input: A set of paragraphs and questions

Output: An answer for each paragraph and question

Done:

a data preparation script
annotation recipe
deployment at answer.tukekemt.xyz (only from tuke)

To be done:

extract annotations from question annotation
input paragraphs with questions (dataset)

Annotation Summary

Annotation work summary

Input: Database of annotations

Output: Summary of work performed by each annotator

Done:

application template
Dockerfile

In progress:

web application for annotation analysis (Tomáš Kuchárik, Flask)
application deployment

To be done:

analyze sql schema and find out who annotated what

Annotation Manual

Output: Recommendations for annotators

To be done:

Web Page for annotators