dmytro_ushatenko/pages/interns/cesar_gutierrez/README.md
Dnaiel Hladek 946bc7f9f1 zz
2024-08-06 01:05:54 +02:00

1.1 KiB

title published taxonomy
Cesar Abascal Gutierrez true
category tag author
iaeste
ner
nlp
Daniel Hladek

Named entity annotations

Intern, probably summer 2019

Cesar Abascal Gutierrez cesarbielva1994@gmail.com

Goals

  • Be able to recognize unknown named entities
  • Create a manually annotated training set from speech transcripts
  • Propose an annotation schema

Plan

  • Convert speech transcripts into a training set
  • Train and evaluate classifier
  • Establish manual annotation
  • Select unannotated data

Data preparation

Input: Transcriber transcripts with inconsistent annotations

 * First small letter: regular word
 * Capital: named entity
 * ''^^'': faoreign word
 * ''@'': noise
 * ''_'': multi word expression
 * ''/'': pronuncation

Output: A file that can be read by spacy convert

People

Tools

 * Machine learning : https://spacy.io/usage/training
 * Manual Annotation : https://prodi.gy/