--- title: Manohar Gowdru Shridhara published: true taxonomy: category: [phd2024] tag: [lm,nlp,hatespeech] author: Daniel Hladek --- # Manohar Gowdru Shridhara Beginning of the study: 2021 repository: https://git.kemt.fei.tuke.sk/mg240ia ## Disertation Thesis in 2023/24 Hate Speech Detection Goals: - Write a dissertaion thesis - Publish 2 A-class journal papers ## Minimal Thesis (preliminary dissertaion and exam in 2022/23) Goals: - Provide state-of-the-art overview. - Formulate dissertation theses (describe scientific contribution of the thesis). - Prepare to reach the scientific contribution. - Publish 4 conference papers. ## First year of PhD study Goals: - Provide state-of-the-art overview. - Read and make notes from at least 100 scientific papers or books. - Publish at least 2 conference papers. - Prepare for minimal thesis. Resources: - [Hate Speech Project Page](/topics/hatespeech) - https://hatespeechdata.com/ - [Hate speech detection: Challenges and solutions](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6701757/) - [HateBase](https://hatebase.org/) - [Resources and benchmark corpora for hate speech detection: a systematic review] (https://link.springer.com/article/10.1007/s10579-020-09502-8) 14.7: Status: - Try to prepare an experiment with the selected dataset. https://git.kemt.fei.tuke.sk/mg240ia/Hate_Speech_IMAYFLY_and_HORSEHERD - Pick a feasible dataset and method to start with: kannada dataset, tagging sentiment for movie reviews. - Finish experiments, upload source codes into git, and describe the experiments. . - Done - Summarize the results in the table and publish the table on git. . - Done Open tasks: - Focus on making a baseline experiment for sentiment classification using classical methods, such as Transformers. - For preparing a web application with demo, learn about streamlit. In progress: https://git.kemt.fei.tuke.sk/mg240ia/Hate-Speech-Detector-Streamlit Read Papers : - https://aclanthology.org/2020.peoples-1.6.pdf - https://aclanthology.org/2022.ltedi-1.14/ - https://arxiv.org/abs/2108.03867 - https://arxiv.org/pdf/2112.15417v4.pdf - https://arxiv.org/ftp/arxiv/papers/2202/2202.04725.pdf - https://github.com/manikandan-ravikiran/DOSA/blob/main/EACL_Final_Paper.pdf - https://aclanthology.org/2020.icon-main.13.pdf - http://ceur-ws.org/Vol-3159/T6-4.pdf - https://www.researchgate.net/publication/353819476_Hope_Speech_detection_in_under-resourced_Kannada_language - https://www.researchgate.net/publication/346964457_Creation_of_Corpus_and_analysis_in_Code-Mixed_Kannada-English_Twitter_data_for_Emotion_Prediction - https://www.semanticscholar.org/paper/Detecting-stance-in-kannada-social-media-code-mixed-SrinidhiSkanda-Kumar/f651d67211809f2036ac81c27e55d02bd061ed64 - https://www.academia.edu/81920734/Findings_of_the_Sentiment_Analysis_of_Dravidian_Languages_in_Code_Mixed_Text - https://competitions.codalab.org/competitions/30642#learn_the_details - https://paperswithcode.com/paper/creation-of-corpus-and-analysis-in-code-mixed - https://paperswithcode.com/paper/hope-speech-detection-in-under-resourced#code ## Meeting 13.6. - Implemented a Mayfly and Horse Heard Algorithms in Python and Matlab for HS datasets. - Written a draft of a paper. - Performed experiments on HS with Word2Vec, FastText, OneHot. Tasks: - Implement open tasks from the previous meetings !!!!!!!! - Share Scripts with GIT and Drafts with Online Word or Docs !!! - try https://huggingface.co/cardiffnlp/twitter-roberta-base-hate, try to repeat the training and evaluation ## Meeting 24.5. - shared colab notebook, with on-going implementation of mayfly algorithm for preprocessing in sentiment recognition in a twitter dataset. Tasks: - Implement open tasks from the previous meetings !!! - [ ] Focus on making a baseline experiment for sentiment classification using classcal methods, such as Transformers. - [x] Consider using pre-trained embeddings. FastText, word2vec, sentence-transformers, Labse, Laser, Supplemental tasks: - [x] Fininsh the mayfly implementation ## Meeting 20.5. - learned about Firefly / mayfly optimization algorithm. - read ten papers, - wrote 1 page abstract about possible system, based od DBN. ## Meeting 25.4. - Learned aboud deep learning lifecycle / evaluation, BERT, RoBERTa, GPT - Tried HF transformers, Spacy, NLTK, word embeddings, sentence transformers. - Set up a repo with notes: https://git.kemt.fei.tuke.sk/mg240ia Tasks: - [ ] Publish experiments into the repository. - [ ] Prepare a paper for publication in faculty proceedings http://eei.fei.tuke.sk/#!/ - [ ] Send me draft in advance. Suplemental tasks: - [x] For presentation of the results, learn about https://wandb.ai/. This can dispplay results (learning curve, etc.) - [ ] For preparing a web aplication with demo, learn about streamlit. ## Meeting 12.4. - Created repositories, empty so far. - Tried to replicate the results from "Emotion and sentiment analysis of tweets using BERT" paper and "Fine-Tuning BERT Based Approach for Multi-Class Sentiment Analysis on Twitter Emotion Data". - The experiments are based on BERT (which kind?), Tweet Emotion Intensity. - Prepared colab notebook with experiments. Tasks: - [ ] Finish experiments, upload source codes into git, provide a description of the experiments. - [ ] Try to improve the results - try different kind of BERT - roberta, electra, xl-net. Can "generative models" be used? (gpt, bart, t5). Can "sentence transformers be used" - labse, laser. - [x] Learn about "Sentence Transformers". - [ ] Summarize the results in the table, publish the table on git. - [-] Use Markdown for formatting. There is "Typora". - [-] Continue to improve the SCYR paper. - If you have some conference in mind, tell me. ## Meeting 25.3.22 - Learned about Transformers, BERT, LSTM and RNN. - Tried HuggingFace transformers library - Started Google Colab - executing sentiment analysis, hf transformers pipeline functions. - prepared datasets: twitter-roberta Datasets. Experiments a re riunnig, no results yet. - prepared a short note about nlp and neural networks. - still working on the SCYR paper Tasks: - [-] finish experiments about sentiment and present results. - [-] create a repository on git.kemt.fei.tuke.sk and upload your experiments, results and notes. Use you student creadentials. - [-] continue working on "SCYR" review paper, consider publishing it elswhere (the firs version got rejected). - [-] prepare an outline for another paper with sentiment classification. ## Meeting 10.3.22 - Improvement of the report. - Installed Transformers and Anaconda Tasks: - Try [this model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with your own text. - Learn how Transformers Neural Network Works. Learn how Roberta Model training works. Learn how BERT model finetuning works. Write a short memo about your findings and papers read on this topic. - Pick a dataset: - https://huggingface.co/datasets/sentiment140 (english) - https://www.clarin.si/repository/xmlui/handle/11356/1054 (multilingua) - https://huggingface.co/datasets/tamilmixsentiment (english tamil code switch) - Grab baseline BERT type model and try to finetune it for sentiment classification. - For finetuning and evaluation you can use this scrip https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification - For finetuning you will need to install CUDA and Pytorch. It can work on CPU or NOT. - If you need GPU, use the school server idoc.fei.tuke.sk or google Colab. - Continue working on the paper. - Remind me about the SCYR conference payment. ## Meeting 21.2.22 - Written a report about HS detection (in progress) Tasks: - Repair the report (rewrite copied parts, make the paragrapsh be logically ordered, teoreticaly - formaly define the HS detection, analyze te datasets in detail - how do they work. what metric do they use). - Install Hugging Face Transformers and come through a tutorial ## Meeting 31.1.22 - Read some blogs about transformers - Installed and tied transformers - Worked on the review paper - Picked the Twitter Dataset on keggle - still selecting a method Open tasks: - Continue to work on the paper and share the paper with us. - Prepare som ideas for the common discussion about the project. - [ ] Try to prepare an experiment with the selected dataset. - [ ] You can use the school CUDA infrastructre (idoc.fei.tuke.sk). - [ ] Set up a repository for experiments, use the school git server git.kemt.fei.tuke.sk. - [x] Get ready to post a paper on the school PhD conference SCYR, deadline is in the middle of February http://scyr.kpi.fei.tuke.sk/. ### Meeting 10.1.22 - Set up a git account https://github.com/ManoGS with script to prepare "twitter" dataset and "english" dataset for HS detection. - confgured laptop with (Anaconda) / PyCharm, pytorch, cuda gone throug some basic python tutorials. - Read some blogs how to use kaggle (dataset database). - tutorials on huggingface transformers - understanding sentiment analysis. Open tasks: - [x] Continue to work on the review - with datasets and methods (specified below). - [x] Read and make notes about transformers, neural language models and finentuning. - [ ] Pick feasible dataset and method to start with. - [ ] You can use the school CUDA infrastructre (idoc.fei.tuke.sk). - [ ] Set up a repository for experiments, use the school git server git.kemt.fei.tuke.sk. - [ ] Get ready to post a paper on the school PhD conference SCYR, deadline is in the middle of February http://scyr.kpi.fei.tuke.sk/. #### Meeting 16.12.21 - A report was provided (through Teams). - Installed Anaconda and started s Transformers tutorial - Started Dive into python book Task: - Report: Create a detailed list of available datasets for HS. - Report: Create a detailed description of the state of the art approaches for HS detection. - Practical: Continue with open tasks below. (pick datasetm, perform classification,evaluate the experiment.) #### Meeting 10.12.21 No report (just draft) was provided so far. 1. Read papers from below and make notes what you have learned fro the papers. For each note make a bibliographic citation. Write down authors of the paper, name paper of the paper, year, publisher and other important information. When you find out something, make a reference with a number to that paper. You can use a bibliografic manager software. Mendeley, Endnote, Jabref. 2. From the papers find out answers to the questions below. 3. Pick a hatespeech dataset. 4. Pick an approach and Python library for HS classification. 5. Create a [GIT](https://git.kemt.fei.tuke.sk) repository and share your experiment files. Do not commit data files, just links how to download the files. 6. Perform and evaluate experiments. #### Meeting 10.11.21 #### First tasks Prepare a report where you will explain: - what is hate speech detection, - where and why you can use hate-speech detection, - what are state-of-the-art methods for hate speech detection, - how can you evaluate a hate-speech detection system, - what datasets for hate-speech detection are available, The report should properly cite scientific bibliographical sources. Use a bibliography manager software, such as Mendeley. Create a [VPN connection](https://uvt.tuke.sk/wps/portal/uv/sluzby/vzdialeny-pristup-vpn) to the university network to have access to the scientific databses. Use scientific indexes to discover literature: - [Scopus](https://www.scopus.com/) (available from TUKE VPN) - [Scholar](httyps://scholar.google.com) Your review can start with: - [Hate speech detection: Challenges and solutions](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6701757/) - [HateBase](https://hatebase.org/) - [Resources and benchmark corpora for hate speech detection: a systematic review](https://link.springer.com/article/10.1007/s10579-020-09502-8) Get to know the Python programming language - Read [Dive into Python](https://diveintopython3.net/) - Install [Anaconda](https://www.anaconda.com/) - Try [HuggingFace Transformers library]( https://huggingface.co/transformers/quicktour.html)