| .. | ||
| README.md | ||
| title | published | taxonomy | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Manohar Gowdru Shridhara | true | 
  | 
Manohar Gowdru Shridhara
Beginning of the study: 2021
repository: https://git.kemt.fei.tuke.sk/mg240ia
Disertation Thesis
in 2023/24
Hate Speech Detection
Goals:
- Write a dissertaion thesis
 - Publish 2 A-class journal papers
 
Minimal Thesis
(preliminary dissertaion and exam in 2022/23)
Goals:
- Provide state-of-the-art overview.
 - Formulate dissertation theses (describe scientific contribution of the thesis).
 - Prepare to reach the scientific contribution.
 - Publish 4 conference papers.
 
First year of PhD study
Goals:
- Provide state-of-the-art overview.
 - Read and make notes from at least 100 scientific papers or books.
 - Publish at least 2 conference papers.
 - Prepare for minimal thesis.
 
Resources:
- Hate Speech Project Page
 - https://hatespeechdata.com/
 - Hate speech detection: Challenges and solutions
 - HateBase
 - [Resources and benchmark corpora for hate speech detection: a systematic review] (https://link.springer.com/article/10.1007/s10579-020-09502-8)
 
Meeting 24.5.
- shared colab notebook, with on-going implementation of mayfly algorithm for preprocessing in sentiment recogniution in a twitter dataset.
 
Tasks:
- Implement open tasks from the previous meetings !!!
 - Focus on making a baseline experiment for sentiment classification using classcal methods, such as Transformers.
 - Consider using pre-trained embeddings. FastText, word2vec, sentence-transformers, Labse, Laser,
 
Supplemental tasks:
- Fininsh the mayfly implementation
 
Meeting 20.5.
- learned about Firefly / mayfly optimization algorithm.
 - read ten papers,
 - wrote 1 page abstract about possible system, based od DBN.
 
Meeting 25.4.
- Learned aboud deep learning lifecycle / evaluation, BERT, RoBERTa, GPT
 - Tried HF transformers, Spacy, NLTK, word embeddings, sentence transformers.
 - Set up a repo with notes: https://git.kemt.fei.tuke.sk/mg240ia
 
Tasks:
- Publish experiments into the repository.
 - Prepare a paper for publication in faculty proceedings http://eei.fei.tuke.sk/#!/
 - Send me draft in advance.
 
Suplemental tasks:
- For presentation of the results, learn about https://wandb.ai/. This can dispplay results (learning curve, etc.)
 - For preparing a web aplication with demo, learn about streamlit.
 
Meeting 12.4.
- Created repositories, empty so far.
 - Tried to replicate the results from "Emotion and sentiment analysis of tweets using BERT" paper and "Fine-Tuning BERT Based Approach for Multi-Class Sentiment Analysis on Twitter Emotion Data".
 - The experiments are based on BERT (which kind?), Tweet Emotion Intensity.
 - Prepared colab notebook with experiments.
 
Tasks:
- Finish experiments, upload source codes into git, provide a description of the experiments.
 - Try to improve the results - try different kind of BERT - roberta, electra, xl-net. Can "generative models" be used? (gpt, bart, t5). Can "sentence transformers be used" - labse, laser.
 - Learn about "Sentence Transformers".
 - Summarize the results in the table, publish the table on git.
 - [-] Use Markdown for formatting. There is "Typora".
 - [-] Continue to improve the SCYR paper.
 - If you have some conference in mind, tell me.
 
Meeting 25.3.22
- Learned about Transformers, BERT, LSTM and RNN.
 - Tried HuggingFace transformers library
 - Started Google Colab - executing sentiment analysis, hf transformers pipeline functions.
 - prepared datasets: twitter-roberta Datasets. Experiments a re riunnig, no results yet.
 - prepared a short note about nlp and neural networks.
 - still working on the SCYR paper
 
Tasks:
- [-] finish experiments about sentiment and present results.
 - [-] create a repository on git.kemt.fei.tuke.sk and upload your experiments, results and notes. Use you student creadentials.
 - [-] continue working on "SCYR" review paper, consider publishing it elswhere (the firs version got rejected).
 - [-] prepare an outline for another paper with sentiment classification.
 
Meeting 10.3.22
- Improvement of the report.
 - Installed Transformers and Anaconda
 
Tasks:
- Try this model with your own text.
 - Learn how Transformers Neural Network Works. Learn how Roberta Model training works. Learn how BERT model finetuning works. Write a short memo about your findings and papers read on this topic.
 - Pick a dataset:
- https://huggingface.co/datasets/sentiment140 (english)
 - https://www.clarin.si/repository/xmlui/handle/11356/1054 (multilingua)
 - https://huggingface.co/datasets/tamilmixsentiment (english tamil code switch)
 
 - Grab baseline BERT type model and try to finetune it for sentiment classification.
 - For finetuning and evaluation you can use this scrip https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification
 - For finetuning you will need to install CUDA and Pytorch. It can work on CPU or NOT.
 - If you need GPU, use the school server idoc.fei.tuke.sk or google Colab.
 - Continue working on the paper.
 - Remind me about the SCYR conference payment.
 
Meeting 21.2.22
- Written a report about HS detection (in progress)
 
Tasks:
- Repair the report (rewrite copied parts, make the paragrapsh be logically ordered, teoreticaly - formaly define the HS detection, analyze te datasets in detail - how do they work. what metric do they use).
 - Install Hugging Face Transformers and come through a tutorial
 
Meeting 31.1.22
- Read some blogs about transformers
 - Installed and tied transformers
 - Worked on the review paper
 - Picked the Twitter Dataset on keggle
 - still selecting a method
 
Open tasks:
- Continue to work on the paper and share the paper with us.
 - Prepare som ideas for the common discussion about the project.
 - Try to prepare an experiment with the selected dataset.
 - You can use the school CUDA infrastructre (idoc.fei.tuke.sk).
 - Set up a repository for experiments, use the school git server git.kemt.fei.tuke.sk.
 - Get ready to post a paper on the school PhD conference SCYR, deadline is in the middle of February http://scyr.kpi.fei.tuke.sk/.
 
Meeting 10.1.22
- Set up a git account https://github.com/ManoGS with script to prepare "twitter" dataset and "english" dataset for HS detection.
 - confgured laptop with (Anaconda) / PyCharm, pytorch, cuda gone throug some basic python tutorials.
 - Read some blogs how to use kaggle (dataset database).
 - tutorials on huggingface transformers - understanding sentiment analysis.
 
Open tasks:
- Continue to work on the review - with datasets and methods (specified below).
 - Read and make notes about transformers, neural language models and finentuning.
 - Pick feasible dataset and method to start with.
 - You can use the school CUDA infrastructre (idoc.fei.tuke.sk).
 - Set up a repository for experiments, use the school git server git.kemt.fei.tuke.sk.
 - Get ready to post a paper on the school PhD conference SCYR, deadline is in the middle of February http://scyr.kpi.fei.tuke.sk/.
 
Meeting 16.12.21
- A report was provided (through Teams).
 - Installed Anaconda and started s Transformers tutorial
 - Started Dive into python book
 
Task:
- Report: Create a detailed list of available datasets for HS.
 - Report: Create a detailed description of the state of the art approaches for HS detection.
 - Practical: Continue with open tasks below. (pick datasetm, perform classification,evaluate the experiment.)
 
Meeting 10.12.21
No report (just draft) was provided so far.
- Read papers from below and make notes what you have learned fro the papers. For each note make a bibliographic citation. Write down authors of the paper, name paper of the paper, year, publisher and other important information. When you find out something, make a reference with a number to that paper. You can use a bibliografic manager software. Mendeley, Endnote, Jabref.
 - From the papers find out answers to the questions below.
 - Pick a hatespeech dataset.
 - Pick an approach and Python library for HS classification.
 - Create a GIT repository and share your experiment files. Do not commit data files, just links how to download the files.
 - Perform and evaluate experiments.
 
Meeting 10.11.21
First tasks
Prepare a report where you will explain:
- what is hate speech detection,
 - where and why you can use hate-speech detection,
 - what are state-of-the-art methods for hate speech detection,
 - how can you evaluate a hate-speech detection system,
 - what datasets for hate-speech detection are available,
 
The report should properly cite scientific bibliographical sources. Use a bibliography manager software, such as Mendeley.
Create a VPN connection to the university network to have access to the scientific databses. Use scientific indexes to discover literature:
Your review can start with:
- Hate speech detection: Challenges and solutions
 - HateBase
 - Resources and benchmark corpora for hate speech detection: a systematic review
 
Get to know the Python programming language
- Read Dive into Python
 - Install Anaconda
 - Try HuggingFace Transformers library