forked from KEMT/zpwiki
		
	
		
			
				
	
	
		
			60 lines
		
	
	
		
			1.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			60 lines
		
	
	
		
			1.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						|
title: Hate Speech
 | 
						|
category: [project]
 | 
						|
tag: [hatespeech,nlp,nlm]
 | 
						|
---
 | 
						|
 | 
						|
# Hate Speech Scientific Project
 | 
						|
 | 
						|
Goal:
 | 
						|
 | 
						|
- To be able to recognize parts of text that contains hate or vulgarisms.
 | 
						|
 | 
						|
Possible applications:
 | 
						|
 | 
						|
- Management of discussion forums / detection of spam or abuse.
 | 
						|
- "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.
 | 
						|
 | 
						|
Plan:
 | 
						|
 | 
						|
- Perform a review of the state-of-the-art
 | 
						|
- Pick established (english) corpora
 | 
						|
- Formalize the problem - classification of sentiment, recognition of topic, keyword selection, 
 | 
						|
- Propose a preliminary system, repeat existing approach.
 | 
						|
- Create small evaluation set in Slovak
 | 
						|
- Try multilingual/crosslingual approach. Possibility of machine translation.
 | 
						|
- Annotate a bigger Slovak Corpus
 | 
						|
- Recognize  and publish scientific contribution
 | 
						|
 | 
						|
Tasks:
 | 
						|
 | 
						|
- Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi
 | 
						|
- Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model. 
 | 
						|
 | 
						|
Future tasks:
 | 
						|
 | 
						|
- Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US
 | 
						|
- Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate.
 | 
						|
- Prepare existing Slovak Twitter dataaset, trainm evaluate a model. 
 | 
						|
 | 
						|
 | 
						|
People:
 | 
						|
 | 
						|
 | 
						|
- Ján Staš
 | 
						|
- Daniel Hládek
 | 
						|
- Zuzana Sokolová
 | 
						|
- [Vladimír Ferko](/students/2021/vladimir_ferko)
 | 
						|
- Sevval Bulburu
 | 
						|
 | 
						|
Former participants:
 | 
						|
 | 
						|
- [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu)
 | 
						|
 | 
						|
 | 
						|
Links:
 | 
						|
 | 
						|
 | 
						|
- https://europeanonlinehatelab.com/
 | 
						|
- https://hatespeechdata.com/
 | 
						|
- https://oznacuj-dezinfo.kinit.sk/ |