# Automated COVID-19 diagnosis This repo contains the code created by Elien Martens during IAESTE internship summer 2021, Technical University of Kosice (Slovakia). ## Data The [Interspeech Computational Paralinguistics ChallengE (ComParE) 2021](vhttp://www.compare.openaudio.eu/now/) proposes two challenges related to COVID-19 detection based on audio samples. Such samples represent speech and cough audio from both healthy and infected speakers. The COVID-19 Speech Sub-Challenge (CSS) offers 3.24 hours of audio recordings containing speech samples, while the COVID19 Cough Sub-Challenge (CCS) provides 1.63 hours of cough samples. The COVID-19 datasets can be obtained through The University of Cambridge. ## How it works - clone repository - add cough or speech dataset (dist folder) (see [Data](##data) section), so that the structure is the following: ``` CovidSpeechChallenge |-- dist/ |-- lab/ |-- wav/ |-- features/ |-- results/ |-- src/ |-- vggish/ |-- run_experiments.sh ``` - run ```run_experiments.sh``` to do the feature extraction, SVM training and prediction for each features combination ## About the code - The ```features/``` folder will contain the extracted features after running the feature extraction codes. These are not included in this repository to limit the storage. - The ```results/``` contains a subfolder for each experiment with the final predictions for devel and test subset (saved as csv file) + plot of the normalized confusion matrix saved as jpg image (only for predictions test subset). This folder of the repository contains the results for the speech audio data. - The ```src/``` folder contains the python files for all feature extraction and SVMs - The ```vggish/``` folder contains the code necessary for VGGish and is downloaded from [https://modelzoo.co/model/audioset](https://modelzoo.co/model/audioset). - Finally ```run_experiments.sh``` is a bash script that can be modified to only run parts of the experiments.