forked from KEMT/zpwiki
		
	| .. | ||
| README.md | ||
| title | category | tag | ||||
|---|---|---|---|---|---|---|
| Hate Speech | 
  | 
  | 
Hate Speech Scientific Project
Goal:
- To be able to recognize parts of text that contains hate or vulgarisms.
 
Possible applications:
- Management of discussion forums / detection of spam or abuse.
 - "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.
 
Plan:
- Perform a review of the state-of-the-art
 - Pick established (english) corpora
 - Formalize the problem - classification of sentiment, recognition of topic, keyword selection,
 - Propose a preliminary system, repeat existing approach.
 - Create small evaluation set in Slovak
 - Try multilingual/crosslingual approach. Possibility of machine translation.
 - Annotate a bigger Slovak Corpus
 - Recognize and publish scientific contribution
 
Tasks:
- Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi
 - Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model.
 
Future tasks:
- Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US
 - Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate.
 - Prepare existing Slovak Twitter dataaset, trainm evaluate a model.
 
People:
- Ján Staš
 - Daniel Hládek
 - Zuzana Sokolová
 - Vladimír Ferko
 - Sevval Bulburu
 
Former participants:
Links: