From 965d5e7dcdac8d0004498b26f17a308a9107b3d2 Mon Sep 17 00:00:00 2001
From: Daniel Hladek <dhladek@gmail.com>
Date: Wed, 1 Jul 2020 18:27:35 +0200
Subject: [PATCH] zz

---
 pages/interns/cesar_gutierrez/README.md | 45 ++++++++++++
 pages/topics/named-entity/README.md     | 94 +++++++++++--------------
 2 files changed, 87 insertions(+), 52 deletions(-)
 create mode 100644 pages/interns/cesar_gutierrez/README.md

diff --git a/pages/interns/cesar_gutierrez/README.md b/pages/interns/cesar_gutierrez/README.md
new file mode 100644
index 0000000000..1d01233763
--- /dev/null
+++ b/pages/interns/cesar_gutierrez/README.md
@@ -0,0 +1,45 @@
+## Named entity annotations
+
+Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
+
+## Goals
+
+  - Be able to recognize unknown named entities
+  - Create a manually annotated training set from speech transcripts
+  - Propose an annotation schema
+
+
+## Plan
+
+  - Convert speech transcripts into a training set
+  - Train and evaluate classifier
+  - Establish manual annotation 
+  - Select unannotated data 
+
+### Data preparation
+
+Input: Transcriber transcripts with inconsistent annotations
+
+``` 
+ * First small letter: regular word
+ * Capital: named entity
+ * ''^^'': faoreign word
+ * ''@'': noise
+ * ''_'': multi word expression
+ * ''/'': pronuncation
+```
+
+Output: A file that can be read by `spacy convert`
+
+## People
+
+- Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
+- Kyryl Kobzar
+- Ediz Morochovič
+
+## Tools
+
+``` 
+ * Machine learning : https://spacy.io/usage/training
+ * Manual Annotation : https://prodi.gy/
+```
diff --git a/pages/topics/named-entity/README.md b/pages/topics/named-entity/README.md
index f88948dbb0..8b32187089 100644
--- a/pages/topics/named-entity/README.md
+++ b/pages/topics/named-entity/README.md
@@ -6,70 +6,60 @@ title: Pomenované entity
 # Pomenované entity
 
 
-## Goals
-
-  - Be able to recognize unknown named entities
-  - Create a manually annotated training set from speech transcripts
-  - Propose an annotation schema
-
-
-## Tasks
-
 ### Príprava dát
 
-- Parsovanie XML Wiki DUMP
-- Filter pre vyradenie článkov
-- Ručný výber článkov
+Vstup: Wiki XML dump
+Výstup: Korpus dokumentov pre anotáciu
+
+urobené:
+
+- Parsovanie XML Wiki DUMP https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/wikicorpus
+
+urobiť:
+
+- Skript pre extrakciu paragrafov.
+- Filter pre vyradenie článkov a paragrafov.
+- Ručný výber článkov.
 
 ### Príprava anotačnej schémy
 
-- Deploymment Prodigy
-- Konverzia dát do Prodigy
+Výstup: nasadená a pripravená aplikácia na anotovanie
+
+urobené:
+
+- Deploymment Prodigy http://skner.tukekemt.xyz
+- Konverzia dát do Prodigy https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/ner
+
+urobiť:
+
 - Anotačný manuál
 - Sada značiek na anotáciu
-- Podporný model?
+- Podporný model? Ak pomáha tak pripraviť aj schému alebo dataset  s podporným modelom.
 
 ### Prípravná anotačná dávka
 
+urobené:
+
+- nasadenie aplikácie pre analýzu anotovaných dát http://aksner.tukekemt.xyz
+
+https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/database_app
+
+prebieha:
+
+- aplikácia pre analýzu anotovaných dát - kto anotoval čo, ako a koľko
+
+urobiť:
+
+- Anotácia dát
+- Príprava skriptu na čistenie anotovaných dát
+
 ### Produkčná anotačná dávka
 
+treba urobiť:
+
 - Motivácia študentov
+- Anotácia dát
+- Analýza anotovaných dát
+- tvorba korpusu anotovaných dát
 
-### Analýza vykonaných anotácií
 
-Aplikácia pre analýzu anotácií
-
-## Plan
-
-  - Convert speech transcripts into a training set
-  - Train and evaluate classifier
-  - Establish manual annotation 
-  - Select unannotated data 
-
-### Data preparation
-
-Input: Transcriber transcripts with inconsistent annotations
-
-``` 
- * First small letter: regular word
- * Capital: named entity
- * ''^^'': faoreign word
- * ''@'': noise
- * ''_'': multi word expression
- * ''/'': pronuncation
-```
-
-Output: A file that can be read by `spacy convert`
-
-## People
-
-- Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
-- Kyryl Kobzar
-- Ediz Morochovič
-
-## Tools
-
-``` 
- * Machine learning : https://spacy.io/usage/training
- * Manual Annotation : https://prodi.gy/
-```