Update 'pages/interns/sevval_bulburu/README.md'

This commit is contained in:
dano 2023-09-05 09:06:02 +00:00
parent ca24fe5a58
commit 18ad2b3e2e

View File

@ -14,33 +14,43 @@ IAESTE Intern Summer 2023, two months
Goal: Help with the [Hate Speech Project](/topics/hatespeech) Goal: Help with the [Hate Speech Project](/topics/hatespeech)
Meeting 5.9.2023
State:
- Proposed own Flask application
- Created Django application with data model https://github.com/hladek/hate-annot
Tasks:
Meeting 22.8.2023 Meeting 22.8.2023
State: State:
- Familiar with Python, Anaconda, Tensorflow, AI projects - Familiar with Python, Anaconda, Tensorflow, AI projects
- created account at idoc.fei.tuke.sk and installed anaconda. - created account at idoc.fei.tuke.sk and installed anaconda.
- Continue with previous open tasks.
- Read a website and pick a dataset from https://hatespeechdata.com/
- Evaluate (calculate p r f1) existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi with any data
- Get familiar with Django.
Notes: Notes:
- ssh bulbur@idoc.fei.tuke.sk - ssh bulbur@idoc.fei.tuke.sk
- nvidia-smi command to check status of GPU. - nvidia-smi command to check status of GPU.
- Use WinSCP to Copy Files. Use anaconda virtual env to create and activate a new python virtual environment. Use Visual Studio Code Remote to delvelop on your computer and run on remote computer (idoc.fei.tuke.sk). Use the same credentials for idoc server. - Use WinSCP to Copy Files. Use anaconda virtual env to create and activate a new python virtual environment. Use Visual Studio Code Remote to delvelop on your computer and run on remote computer (idoc.fei.tuke.sk). Use the same credentials for idoc server.
- Use WSL2 to have local linux just to play. - Use WSL2 to have local linux just to play.
Tasks: Tasks:
- Get familiar with the task of Hate speech detection. Find out how can we use Transformer neural networks to detect and categorize hate speech in internet comments created by random people. - [ ] Get familiar with the task of Hate speech detection. Find out how can we use Transformer neural networks to detect and categorize hate speech in internet comments created by random people.
- Get familiar with the basic tools: Huggingface Transformers, Learn how to use https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi in Python script. Learn something about Transformer neural networks. - [ ] Get familiar with the basic tools: Huggingface Transformers, Learn how to use https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi in Python script. Learn something about Transformer neural networks.
- get familiar with Prodi.gy annotation tool.
- Set up web-based annotation environment for students (open, cooperation with [Vladimir Ferko](/students/2021/vladimir_ferko) ).
- [x] get familiar with Prodi.gy annotation tool.
- [-] Set up web-based annotation environment for students (open, cooperation with [Vladimir Ferko](/students/2021/vladimir_ferko) ).
Ideas fo annotation tools: Ideas for annotation tools:
- https://github.com/UniversalDataTool/universal-data-tool - https://github.com/UniversalDataTool/universal-data-tool
- https://www.johnsnowlabs.com/top-6-text-annotation-tools/ - https://www.johnsnowlabs.com/top-6-text-annotation-tools/
@ -50,9 +60,8 @@ Ideas fo annotation tools:
Future tasks (to be decided): Future tasks (to be decided):
- Evaluate existing multilingual model. E.G. https://huggingface.co/Andrazp/multilingual-hate-speech-robacofi with slovak data
- Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model. - Translate existing English dataset into Slovak. Use OPUS English Slovak Marian NMT model. Train Slovak munolingual model.
- Prepare existing Slovak Twitter dataaset, trainm evaluate a model. - Prepare existing Slovak Twitter dataaset, train evaluate a model.