2019-11-19 12:07:02 +00:00
|
|
|
---
|
2020-01-26 13:17:15 +00:00
|
|
|
title: Pomenované entity
|
2019-11-19 12:07:02 +00:00
|
|
|
---
|
|
|
|
|
2020-06-30 13:13:17 +00:00
|
|
|
|
2020-01-26 13:17:15 +00:00
|
|
|
# Pomenované entity
|
2019-11-19 12:07:02 +00:00
|
|
|
|
2020-06-30 13:13:17 +00:00
|
|
|
|
2019-11-19 12:07:02 +00:00
|
|
|
## Goals
|
|
|
|
|
|
|
|
- Be able to recognize unknown named entities
|
|
|
|
- Create a manually annotated training set from speech transcripts
|
|
|
|
- Propose an annotation schema
|
|
|
|
|
2020-06-30 13:13:17 +00:00
|
|
|
|
|
|
|
## Tasks
|
|
|
|
|
|
|
|
### Príprava dát
|
|
|
|
|
|
|
|
- Parsovanie XML Wiki DUMP
|
|
|
|
- Filter pre vyradenie článkov
|
|
|
|
- Ručný výber článkov
|
|
|
|
|
|
|
|
### Príprava anotačnej schémy
|
|
|
|
|
|
|
|
- Deploymment Prodigy
|
|
|
|
- Konverzia dát do Prodigy
|
|
|
|
- Anotačný manuál
|
|
|
|
- Sada značiek na anotáciu
|
|
|
|
- Podporný model?
|
|
|
|
|
|
|
|
### Prípravná anotačná dávka
|
|
|
|
|
|
|
|
### Produkčná anotačná dávka
|
|
|
|
|
|
|
|
- Motivácia študentov
|
|
|
|
|
|
|
|
### Analýza vykonaných anotácií
|
|
|
|
|
|
|
|
Aplikácia pre analýzu anotácií
|
|
|
|
|
2019-11-19 12:07:02 +00:00
|
|
|
## Plan
|
|
|
|
|
|
|
|
- Convert speech transcripts into a training set
|
|
|
|
- Train and evaluate classifier
|
|
|
|
- Establish manual annotation
|
|
|
|
- Select unannotated data
|
|
|
|
|
|
|
|
### Data preparation
|
|
|
|
|
|
|
|
Input: Transcriber transcripts with inconsistent annotations
|
|
|
|
|
|
|
|
```
|
|
|
|
* First small letter: regular word
|
|
|
|
* Capital: named entity
|
|
|
|
* ''^^'': faoreign word
|
|
|
|
* ''@'': noise
|
|
|
|
* ''_'': multi word expression
|
|
|
|
* ''/'': pronuncation
|
|
|
|
```
|
|
|
|
|
|
|
|
Output: A file that can be read by `spacy convert`
|
|
|
|
|
|
|
|
## People
|
|
|
|
|
2020-01-23 09:57:43 +00:00
|
|
|
- Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
|
|
|
|
- Kyryl Kobzar
|
2020-01-23 10:00:10 +00:00
|
|
|
- Ediz Morochovič
|
2019-11-19 12:07:02 +00:00
|
|
|
|
|
|
|
## Tools
|
|
|
|
|
|
|
|
```
|
|
|
|
* Machine learning : https://spacy.io/usage/training
|
|
|
|
* Manual Annotation : https://prodi.gy/
|
|
|
|
```
|