--- title: Hate Speech category: [project] tag: [hatespeech,nlp,nlm] --- # Hate Speech Scientific Project Goal: - To be able to recognize parts of text that contains hate or vulgarisms. Possible applications: - Management of discussion forums / detection of spam or abuse. - "Postprocessing" for biased generative language models - preventing to generate inapropriate responses. Plan: - Perform a review of the state-of-the-art - Pick established (english) corpora - Formalize the problem - classification of sentiment, recognition of topic, keyword selection, - Propose a preliminary system, repeat existing approach. - Create small evaluation set in Slovak - Try multilingual/crosslingual approach. Possibility of machine translation. - Annotate a bigger Slovak Corpus - Recognize and publish scientific contribution Tasks: - Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi - Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model. Future tasks: - Annotate a Twitter Dataset. Possible guidelines are: https://developers.perspectiveapi.com/s/about-the-api-training-data?language=en_US - Annotate a Facebook Dataset. Use some other guidelines. e.g. sentence-level annotation, for context sensitive hate. - Prepare existing Slovak Twitter dataaset, trainm evaluate a model. People: - Ján Staš - Daniel Hládek - Zuzana Sokolová - [Vladimír Ferko](/students/2021/vladimir_ferko) - [Manohar Gowdru Shridharu](/students/2021/manohar_gowdru_shridharu) Links: - https://europeanonlinehatelab.com/ - https://hatespeechdata.com/ - https://oznacuj-dezinfo.kinit.sk/