forked from KEMT/zpwiki
		
	zz
This commit is contained in:
		
							parent
							
								
									6c2a8ff77b
								
							
						
					
					
						commit
						965d5e7dcd
					
				
							
								
								
									
										45
									
								
								pages/interns/cesar_gutierrez/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										45
									
								
								pages/interns/cesar_gutierrez/README.md
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,45 @@
 | 
			
		||||
## Named entity annotations
 | 
			
		||||
 | 
			
		||||
Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
 | 
			
		||||
 | 
			
		||||
## Goals
 | 
			
		||||
 | 
			
		||||
  - Be able to recognize unknown named entities
 | 
			
		||||
  - Create a manually annotated training set from speech transcripts
 | 
			
		||||
  - Propose an annotation schema
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Plan
 | 
			
		||||
 | 
			
		||||
  - Convert speech transcripts into a training set
 | 
			
		||||
  - Train and evaluate classifier
 | 
			
		||||
  - Establish manual annotation 
 | 
			
		||||
  - Select unannotated data 
 | 
			
		||||
 | 
			
		||||
### Data preparation
 | 
			
		||||
 | 
			
		||||
Input: Transcriber transcripts with inconsistent annotations
 | 
			
		||||
 | 
			
		||||
``` 
 | 
			
		||||
 * First small letter: regular word
 | 
			
		||||
 * Capital: named entity
 | 
			
		||||
 * ''^^'': faoreign word
 | 
			
		||||
 * ''@'': noise
 | 
			
		||||
 * ''_'': multi word expression
 | 
			
		||||
 * ''/'': pronuncation
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Output: A file that can be read by `spacy convert`
 | 
			
		||||
 | 
			
		||||
## People
 | 
			
		||||
 | 
			
		||||
- Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
 | 
			
		||||
- Kyryl Kobzar
 | 
			
		||||
- Ediz Morochovič
 | 
			
		||||
 | 
			
		||||
## Tools
 | 
			
		||||
 | 
			
		||||
``` 
 | 
			
		||||
 * Machine learning : https://spacy.io/usage/training
 | 
			
		||||
 * Manual Annotation : https://prodi.gy/
 | 
			
		||||
```
 | 
			
		||||
@ -6,70 +6,60 @@ title: Pomenované entity
 | 
			
		||||
# Pomenované entity
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Goals
 | 
			
		||||
 | 
			
		||||
  - Be able to recognize unknown named entities
 | 
			
		||||
  - Create a manually annotated training set from speech transcripts
 | 
			
		||||
  - Propose an annotation schema
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Tasks
 | 
			
		||||
 | 
			
		||||
### Príprava dát
 | 
			
		||||
 | 
			
		||||
- Parsovanie XML Wiki DUMP
 | 
			
		||||
- Filter pre vyradenie článkov
 | 
			
		||||
- Ručný výber článkov
 | 
			
		||||
Vstup: Wiki XML dump
 | 
			
		||||
Výstup: Korpus dokumentov pre anotáciu
 | 
			
		||||
 | 
			
		||||
urobené:
 | 
			
		||||
 | 
			
		||||
- Parsovanie XML Wiki DUMP https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/wikicorpus
 | 
			
		||||
 | 
			
		||||
urobiť:
 | 
			
		||||
 | 
			
		||||
- Skript pre extrakciu paragrafov.
 | 
			
		||||
- Filter pre vyradenie článkov a paragrafov.
 | 
			
		||||
- Ručný výber článkov.
 | 
			
		||||
 | 
			
		||||
### Príprava anotačnej schémy
 | 
			
		||||
 | 
			
		||||
- Deploymment Prodigy
 | 
			
		||||
- Konverzia dát do Prodigy
 | 
			
		||||
Výstup: nasadená a pripravená aplikácia na anotovanie
 | 
			
		||||
 | 
			
		||||
urobené:
 | 
			
		||||
 | 
			
		||||
- Deploymment Prodigy http://skner.tukekemt.xyz
 | 
			
		||||
- Konverzia dát do Prodigy https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/ner
 | 
			
		||||
 | 
			
		||||
urobiť:
 | 
			
		||||
 | 
			
		||||
- Anotačný manuál
 | 
			
		||||
- Sada značiek na anotáciu
 | 
			
		||||
- Podporný model?
 | 
			
		||||
- Podporný model? Ak pomáha tak pripraviť aj schému alebo dataset  s podporným modelom.
 | 
			
		||||
 | 
			
		||||
### Prípravná anotačná dávka
 | 
			
		||||
 | 
			
		||||
urobené:
 | 
			
		||||
 | 
			
		||||
- nasadenie aplikácie pre analýzu anotovaných dát http://aksner.tukekemt.xyz
 | 
			
		||||
 | 
			
		||||
https://git.kemt.fei.tuke.sk/dano/annotation/src/branch/master/database_app
 | 
			
		||||
 | 
			
		||||
prebieha:
 | 
			
		||||
 | 
			
		||||
- aplikácia pre analýzu anotovaných dát - kto anotoval čo, ako a koľko
 | 
			
		||||
 | 
			
		||||
urobiť:
 | 
			
		||||
 | 
			
		||||
- Anotácia dát
 | 
			
		||||
- Príprava skriptu na čistenie anotovaných dát
 | 
			
		||||
 | 
			
		||||
### Produkčná anotačná dávka
 | 
			
		||||
 | 
			
		||||
treba urobiť:
 | 
			
		||||
 | 
			
		||||
- Motivácia študentov
 | 
			
		||||
- Anotácia dát
 | 
			
		||||
- Analýza anotovaných dát
 | 
			
		||||
- tvorba korpusu anotovaných dát
 | 
			
		||||
 | 
			
		||||
### Analýza vykonaných anotácií
 | 
			
		||||
 | 
			
		||||
Aplikácia pre analýzu anotácií
 | 
			
		||||
 | 
			
		||||
## Plan
 | 
			
		||||
 | 
			
		||||
  - Convert speech transcripts into a training set
 | 
			
		||||
  - Train and evaluate classifier
 | 
			
		||||
  - Establish manual annotation 
 | 
			
		||||
  - Select unannotated data 
 | 
			
		||||
 | 
			
		||||
### Data preparation
 | 
			
		||||
 | 
			
		||||
Input: Transcriber transcripts with inconsistent annotations
 | 
			
		||||
 | 
			
		||||
``` 
 | 
			
		||||
 * First small letter: regular word
 | 
			
		||||
 * Capital: named entity
 | 
			
		||||
 * ''^^'': faoreign word
 | 
			
		||||
 * ''@'': noise
 | 
			
		||||
 * ''_'': multi word expression
 | 
			
		||||
 * ''/'': pronuncation
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Output: A file that can be read by `spacy convert`
 | 
			
		||||
 | 
			
		||||
## People
 | 
			
		||||
 | 
			
		||||
- Cesar Abascal Gutierrez <cesarbielva1994@gmail.com>
 | 
			
		||||
- Kyryl Kobzar
 | 
			
		||||
- Ediz Morochovič
 | 
			
		||||
 | 
			
		||||
## Tools
 | 
			
		||||
 | 
			
		||||
``` 
 | 
			
		||||
 * Machine learning : https://spacy.io/usage/training
 | 
			
		||||
 * Manual Annotation : https://prodi.gy/
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
		Loading…
	
		Reference in New Issue
	
	Block a user