forked from KEMT/zpwiki
		
	
		
			
				
	
	
		
			70 lines
		
	
	
		
			2.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			70 lines
		
	
	
		
			2.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Hate Speech
 | |
| category: [project]
 | |
| tag: [hatespeech,nlp,nlm]
 | |
| ---
 | |
| 
 | |
| # Hate Speech Scientific Project
 | |
| 
 | |
| Goal:
 | |
| 
 | |
| - To be able to recognize parts of text that contains hate or vulgarisms.
 | |
| 
 | |
| Possible applications:
 | |
| 
 | |
| - Management of discussion forums / detection of spam or abuse.
 | |
| - "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.
 | |
| 
 | |
| Plan:
 | |
| 
 | |
| - Perform a review of the state-of-the-art
 | |
| - Pick established (english) corpora
 | |
| - Formalize the problem - classification of sentiment, recognition of topic, keyword selection, 
 | |
| - Propose a preliminary system, repeat existing approach.
 | |
| - Create small evaluation set in Slovak
 | |
| - Try multilingual/crosslingual approach. Possibility of machine translation.
 | |
| - Annotate a bigger Slovak Corpus
 | |
| - Recognize and publish scientific contribution
 | |
| 
 | |
| Future Tasks:
 | |
| 
 | |
| - Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi
 | |
| - Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model. 
 | |
| - Train or finetune or prompt a large langauge model.
 | |
| 
 | |
| In progress tasks:
 | |
| 
 | |
| - Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US
 | |
| - Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate.
 | |
| - Prepare existing Slovak Twitter dataaset, train evaluate a model. 
 | |
| 
 | |
| Finished tasks:
 | |
| 
 | |
| - Perform preliminary experiments with HS detection (Bulburu)
 | |
| - Prepare an anotation infrastructure for Facebook data annotation (Ferko)
 | |
| - Gather Facebook data and prepare for annotation. (Ferko)
 | |
| 
 | |
| People:
 | |
| 
 | |
| 
 | |
| - Ján Staš
 | |
| - Daniel Hládek
 | |
| - Zuzana Sokolová
 | |
| - [Vladimír Ferko](/students/2021/vladimir_ferko)
 | |
| - [Tetiana Mohorian](/students/2022/tetiana_mohorian)
 | |
| - [Patrik Pokrivčák](/students/2019/patrik_pokrivcak)
 | |
| 
 | |
| 
 | |
| Former participants:
 | |
| 
 | |
| - [Sevval Bulburu](/interns/sevval_bulburu)
 | |
| - [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu)
 | |
| 
 | |
| 
 | |
| Links:
 | |
| 
 | |
| 
 | |
| - https://europeanonlinehatelab.com/
 | |
| - https://hatespeechdata.com/
 | |
| - https://oznacuj-dezinfo.kinit.sk/
 |