zpwiki/pages/topics/named-entity/README.md

---
title: Pomenované entity
---


# Pomenované entity


## Goals

  - Be able to recognize unknown named entities
  - Create a manually annotated training set from speech transcripts
  - Propose an annotation schema


## Tasks

### Príprava dát

- Parsovanie XML Wiki DUMP
- Filter pre vyradenie článkov
- Ručný výber článkov

### Príprava anotačnej schémy

- Deploymment Prodigy
- Konverzia dát do Prodigy
- Anotačný manuál
- Sada značiek na anotáciu
- Podporný model?

### Prípravná anotačná dávka

### Produkčná anotačná dávka

- Motivácia študentov

### Analýza vykonaných anotácií

Aplikácia pre analýzu anotácií

## Plan

  - Convert speech transcripts into a training set
  - Train and evaluate classifier
  - Establish manual annotation
  - Select unannotated data

### Data preparation

Input: Transcriber transcripts with inconsistent annotations

```
 * First small letter: regular word
 * Capital: named entity
 * ''^^'': faoreign word
 * ''@'': noise
 * ''_'': multi word expression
 * ''/'': pronuncation
```

Output: A file that can be read by `spacy convert`

## People

- Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
- Kyryl Kobzar
- Ediz Morochovič

## Tools

```
 * Machine learning : https://spacy.io/usage/training
 * Manual Annotation : https://prodi.gy/
```