zpwiki/pages/topics/hatespeech/README.md

---
title: Hate Speech
category: [project]
tag: [hatespeech,nlp,nlm]
---

# Hate Speech Scientific Project

Goal:

- To be able to recognize parts of text that contains hate or vulgarisms.

Possible applications:

- Management of discussion forums / detection of spam or abuse.
- "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.

Plan:

- Perform a review of the state-of-the-art
- Pick established (english) corpora
- Formalize the problem - classification of sentiment, recognition of topic, keyword selection, 
- Propose a preliminary system, repeat existing approach.
- Create small evaluation set in Slovak
- Try multilingual/crosslingual approach. Possibility of machine translation.
- Annotate a bigger Slovak Corpus
- Recognize and publish scientific contribution

Future Tasks:

- Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi
- Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model. 
- Train or finetune or prompt a large langauge model.

In progress tasks:

- Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US
- Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate.
- Prepare existing Slovak Twitter dataaset, train evaluate a model. 

Finished tasks:

- Perform preliminary experiments with HS detection (Bulburu)
- Prepare an anotation infrastructure for Facebook data annotation (Ferko)
- Gather Facebook data and prepare for annotation. (Ferko)

People:


- Ján Staš
- Daniel Hládek
- Zuzana Sokolová
- [Vladimír Ferko](/students/2021/vladimir_ferko)
- [Tetiana Mohorian](/students/2022/tetiana_mohorian)
- [Patrik Pokrivčák](/students/2019/patrik_pokrivcak)


Former participants:

- [Sevval Bulburu](/interns/sevval_bulburu)
- [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu)


Links:


- https://europeanonlinehatelab.com/
- https://hatespeechdata.com/
- https://oznacuj-dezinfo.kinit.sk/
Update 'pages/topics/hatespeech/README.md' 2021-12-02 13:14:18 +00:00			`---`
			`title: Hate Speech`
			`category: [project]`
			`tag: [hatespeech,nlp,nlm]`
			`---`

Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:42:59 +00:00			`# Hate Speech Scientific Project`
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:28:23 +00:00
			`Goal:`

Update 'pages/topics/hatespeech/README.md' 2022-01-28 12:00:19 +00:00			`- To be able to recognize parts of text that contains hate or vulgarisms.`
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:28:23 +00:00
Update 'pages/topics/hatespeech/README.md' 2022-01-28 12:00:19 +00:00			`Possible applications:`

			`- Management of discussion forums / detection of spam or abuse.`
			`- "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.`
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:28:23 +00:00
			`Plan:`

			`- Perform a review of the state-of-the-art`
			`- Pick established (english) corpora`
			`- Formalize the problem - classification of sentiment, recognition of topic, keyword selection,`
			`- Propose a preliminary system, repeat existing approach.`
			`- Create small evaluation set in Slovak`
			`- Try multilingual/crosslingual approach. Possibility of machine translation.`
Update 'pages/topics/hatespeech/README.md' 2022-01-28 12:00:19 +00:00			`- Annotate a bigger Slovak Corpus`
zz 2024-05-03 13:00:20 +00:00			`- Recognize and publish scientific contribution`
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:29:59 +00:00
zz 2024-05-03 13:00:20 +00:00			`Future Tasks:`
Update 'pages/topics/hatespeech/README.md' 2023-08-18 09:07:14 +00:00
			`- Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi`
			`- Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model.`
zz 2024-05-03 13:00:20 +00:00			`- Train or finetune or prompt a large langauge model.`
Update 'pages/topics/hatespeech/README.md' 2023-08-18 09:07:14 +00:00
Update 'pages/topics/hatespeech/README.md' 2023-11-16 13:58:56 +00:00			`In progress tasks:`
Update 'pages/topics/hatespeech/README.md' 2023-08-18 09:07:14 +00:00
zz 2024-05-03 13:00:20 +00:00			`- Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US`
Update 'pages/topics/hatespeech/README.md' 2023-08-18 09:07:14 +00:00			`- Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate.`
Update 'pages/topics/hatespeech/README.md' 2023-11-16 13:58:56 +00:00			`- Prepare existing Slovak Twitter dataaset, train evaluate a model.`

			`Finished tasks:`
Update 'pages/topics/hatespeech/README.md' 2023-08-18 09:07:14 +00:00
Update 'pages/topics/hatespeech/README.md' 2023-11-16 13:58:56 +00:00			`- Perform preliminary experiments with HS detection (Bulburu)`
			`- Prepare an anotation infrastructure for Facebook data annotation (Ferko)`
			`- Gather Facebook data and prepare for annotation. (Ferko)`
Update 'pages/topics/hatespeech/README.md' 2023-08-18 09:07:14 +00:00
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:29:59 +00:00			`People:`

Update 'pages/topics/hatespeech/README.md' 2023-08-03 13:42:03 +00:00
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:29:59 +00:00			`- Ján Staš`
			`- Daniel Hládek`
			`- Zuzana Sokolová`
Update 'pages/topics/hatespeech/README.md' 2023-08-03 13:42:03 +00:00			`- [Vladimír Ferko](/students/2021/vladimir_ferko)`
zz 2024-05-03 13:00:20 +00:00			`- [Tetiana Mohorian](/students/2022/tetiana_mohorian)`
			`- [Patrik Pokrivčák](/students/2019/patrik_pokrivcak)`

Update 'pages/topics/hatespeech/README.md' 2023-08-22 08:01:16 +00:00
			`Former participants:`

zz 2024-05-03 13:00:20 +00:00			`- [Sevval Bulburu](/interns/sevval_bulburu)`
Update 'pages/topics/hatespeech/README.md' 2022-01-28 12:04:51 +00:00			`- [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu)`
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:34:21 +00:00
Update 'pages/topics/hatespeech/README.md' 2023-08-03 13:42:03 +00:00
Update 'pages/topics/hatespeech/README.md' 2022-02-03 10:49:55 +00:00			`Links:`
Update 'pages/topics/hatespeech/README.md' 2021-12-02 08:42:59 +00:00
Update 'pages/topics/hatespeech/README.md' 2023-08-03 13:42:03 +00:00
			`- https://europeanonlinehatelab.com/`
Update 'pages/topics/hatespeech/README.md' 2022-02-03 10:49:55 +00:00			`- https://hatespeechdata.com/`
zz 2024-05-03 13:00:20 +00:00			`- https://oznacuj-dezinfo.kinit.sk/`