2021-12-02 13:14:18 +00:00
|
|
|
---
|
|
|
|
title: Hate Speech
|
|
|
|
category: [project]
|
|
|
|
tag: [hatespeech,nlp,nlm]
|
|
|
|
---
|
|
|
|
|
2021-12-02 08:42:59 +00:00
|
|
|
# Hate Speech Scientific Project
|
2021-12-02 08:28:23 +00:00
|
|
|
|
|
|
|
Goal:
|
|
|
|
|
2022-01-28 12:00:19 +00:00
|
|
|
- To be able to recognize parts of text that contains hate or vulgarisms.
|
2021-12-02 08:28:23 +00:00
|
|
|
|
2022-01-28 12:00:19 +00:00
|
|
|
Possible applications:
|
|
|
|
|
|
|
|
- Management of discussion forums / detection of spam or abuse.
|
|
|
|
- "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.
|
2021-12-02 08:28:23 +00:00
|
|
|
|
|
|
|
Plan:
|
|
|
|
|
|
|
|
- Perform a review of the state-of-the-art
|
|
|
|
- Pick established (english) corpora
|
|
|
|
- Formalize the problem - classification of sentiment, recognition of topic, keyword selection,
|
|
|
|
- Propose a preliminary system, repeat existing approach.
|
|
|
|
- Create small evaluation set in Slovak
|
|
|
|
- Try multilingual/crosslingual approach. Possibility of machine translation.
|
2022-01-28 12:00:19 +00:00
|
|
|
- Annotate a bigger Slovak Corpus
|
2024-05-03 13:00:20 +00:00
|
|
|
- Recognize and publish scientific contribution
|
2021-12-02 08:29:59 +00:00
|
|
|
|
2024-05-03 13:00:20 +00:00
|
|
|
Future Tasks:
|
2023-08-18 09:07:14 +00:00
|
|
|
|
|
|
|
- Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi
|
|
|
|
- Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model.
|
2024-05-03 13:00:20 +00:00
|
|
|
- Train or finetune or prompt a large langauge model.
|
2023-08-18 09:07:14 +00:00
|
|
|
|
2023-11-16 13:58:56 +00:00
|
|
|
In progress tasks:
|
2023-08-18 09:07:14 +00:00
|
|
|
|
2024-05-03 13:00:20 +00:00
|
|
|
- Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US
|
2023-08-18 09:07:14 +00:00
|
|
|
- Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate.
|
2023-11-16 13:58:56 +00:00
|
|
|
- Prepare existing Slovak Twitter dataaset, train evaluate a model.
|
|
|
|
|
|
|
|
Finished tasks:
|
2023-08-18 09:07:14 +00:00
|
|
|
|
2023-11-16 13:58:56 +00:00
|
|
|
- Perform preliminary experiments with HS detection (Bulburu)
|
|
|
|
- Prepare an anotation infrastructure for Facebook data annotation (Ferko)
|
|
|
|
- Gather Facebook data and prepare for annotation. (Ferko)
|
2023-08-18 09:07:14 +00:00
|
|
|
|
2021-12-02 08:29:59 +00:00
|
|
|
People:
|
|
|
|
|
2023-08-03 13:42:03 +00:00
|
|
|
|
2021-12-02 08:29:59 +00:00
|
|
|
- Ján Staš
|
|
|
|
- Daniel Hládek
|
|
|
|
- Zuzana Sokolová
|
2023-08-03 13:42:03 +00:00
|
|
|
- [Vladimír Ferko](/students/2021/vladimir_ferko)
|
2024-05-03 13:00:20 +00:00
|
|
|
- [Tetiana Mohorian](/students/2022/tetiana_mohorian)
|
|
|
|
- [Patrik Pokrivčák](/students/2019/patrik_pokrivcak)
|
|
|
|
|
2023-08-22 08:01:16 +00:00
|
|
|
|
|
|
|
Former participants:
|
|
|
|
|
2024-05-03 13:00:20 +00:00
|
|
|
- [Sevval Bulburu](/interns/sevval_bulburu)
|
2022-01-28 12:04:51 +00:00
|
|
|
- [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu)
|
2021-12-02 08:34:21 +00:00
|
|
|
|
2023-08-03 13:42:03 +00:00
|
|
|
|
2022-02-03 10:49:55 +00:00
|
|
|
Links:
|
2021-12-02 08:42:59 +00:00
|
|
|
|
2023-08-03 13:42:03 +00:00
|
|
|
|
|
|
|
- https://europeanonlinehatelab.com/
|
2022-02-03 10:49:55 +00:00
|
|
|
- https://hatespeechdata.com/
|
2024-05-03 13:00:20 +00:00
|
|
|
- https://oznacuj-dezinfo.kinit.sk/
|