39 lines
1.0 KiB
Markdown
39 lines
1.0 KiB
Markdown
---
|
|
title: Hate Speech
|
|
category: [project]
|
|
tag: [hatespeech,nlp,nlm]
|
|
---
|
|
|
|
# Hate Speech Scientific Project
|
|
|
|
Goal:
|
|
|
|
- To be able to recognize parts of text that contains hate or vulgarisms.
|
|
|
|
Possible applications:
|
|
|
|
- Management of discussion forums / detection of spam or abuse.
|
|
- "Postprocessing" for biased generative language models - preventing to generate inapropriate responses.
|
|
|
|
Plan:
|
|
|
|
- Perform a review of the state-of-the-art
|
|
- Pick established (english) corpora
|
|
- Formalize the problem - classification of sentiment, recognition of topic, keyword selection,
|
|
- Propose a preliminary system, repeat existing approach.
|
|
- Create small evaluation set in Slovak
|
|
- Try multilingual/crosslingual approach. Possibility of machine translation.
|
|
- Annotate a bigger Slovak Corpus
|
|
- Recognize and publish scientific contribution
|
|
|
|
People:
|
|
|
|
- Ján Staš
|
|
- Daniel Hládek
|
|
- Zuzana Sokolová
|
|
- [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu)
|
|
|
|
Links:
|
|
|
|
- https://hatespeechdata.com/
|
|
- https://oznacuj-dezinfo.kinit.sk/ |