zz

2020-07-01 18:27:35 +02:00 · 2020-07-01 18:27:35 +02:00 · 965d5e7dcd
commit 965d5e7dcd
parent 6c2a8ff77b
2 changed files with 87 additions and 52 deletions
--- a/pages/interns/cesar_gutierrez/README.md
+++ b/pages/interns/cesar_gutierrez/README.md
@ -0,0 +1,45 @@
 ## Named entity annotations
 Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
 ## Goals
  - Be able to recognize unknown named entities
  - Create a manually annotated training set from speech transcripts
  - Propose an annotation schema
 ## Plan
  - Convert speech transcripts into a training set
  - Train and evaluate classifier
  - Establish manual annotation 
  - Select unannotated data 
 ### Data preparation
 Input: Transcriber transcripts with inconsistent annotations
 ``` 
 * First small letter: regular word
 * Capital: named entity
 * ''^^'': faoreign word
 * ''@'': noise
 * ''_'': multi word expression
 * ''/'': pronuncation
 ```
 Output: A file that can be read by `spacy convert`
 ## People
 - Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
 - Kyryl Kobzar
 - Ediz Morochovič
 ## Tools
 ``` 
 * Machine learning : https://spacy.io/usage/training
 * Manual Annotation : https://prodi.gy/
 ```
--- a/pages/topics/named-entity/README.md
+++ b/pages/topics/named-entity/README.md
@ -6,70 +6,60 @@ title: Pomenované entity
 # Pomenované entity
 ## Goals
  - Be able to recognize unknown named entities
  - Create a manually annotated training set from speech transcripts
  - Propose an annotation schema
 ## Tasks
 ### Príprava dát
- Parsovanie XML Wiki DUMP
+Vstup: Wiki XML dump
- Filter pre vyradenie článkov
+Výstup: Korpus dokumentov pre anotáciu
- Ručný výber článkov
+
 urobené:
 - Parsovanie XML Wiki DUMP https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/wikicorpus
 urobiť:
 - Skript pre extrakciu paragrafov.
 - Filter pre vyradenie článkov a paragrafov.
 - Ručný výber článkov.
 ### Príprava anotačnej schémy
- Deploymment Prodigy
+Výstup: nasadená a pripravená aplikácia na anotovanie
- Konverzia dát do Prodigy
+
 urobené:
 - Deploymment Prodigy http://skner.tukekemt.xyz
 - Konverzia dát do Prodigy https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/ner
 urobiť:
 - Anotačný manuál
 - Sada značiek na anotáciu
- Podporný model?
+- Podporný model? Ak pomáha tak pripraviť aj schému alebo dataset  s podporným modelom.
 ### Prípravná anotačná dávka
 urobené:
 - nasadenie aplikácie pre analýzu anotovaných dát http://aksner.tukekemt.xyz
 https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/database_app
 prebieha:
 - aplikácia pre analýzu anotovaných dát - kto anotoval čo, ako a koľko
 urobiť:
 - Anotácia dát
 - Príprava skriptu na čistenie anotovaných dát
 ### Produkčná anotačná dávka
 treba urobiť:
 - Motivácia študentov
 - Anotácia dát
 - Analýza anotovaných dát
 - tvorba korpusu anotovaných dát
 ### Analýza vykonaných anotácií
 Aplikácia pre analýzu anotácií
 ## Plan
  - Convert speech transcripts into a training set
  - Train and evaluate classifier
  - Establish manual annotation 
  - Select unannotated data 
 ### Data preparation
 Input: Transcriber transcripts with inconsistent annotations
 ``` 
 * First small letter: regular word
 * Capital: named entity
 * ''^^'': faoreign word
 * ''@'': noise
 * ''_'': multi word expression
 * ''/'': pronuncation
 ```
 Output: A file that can be read by `spacy convert`
 ## People
 - Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
 - Kyryl Kobzar
 - Ediz Morochovič
 ## Tools
 ``` 
 * Machine learning : https://spacy.io/usage/training
 * Manual Annotation : https://prodi.gy/
 ```