1.9 KiB
1.9 KiB
Automated COVID-19 diagnosis
This repo contains the code created by Elien Martens during IAESTE internship summer 2021, Technical University of Kosice (Slovakia).
Data
The Interspeech Computational Paralinguistics ChallengE (ComParE) 2021 proposes two challenges related to COVID-19 detection based on audio samples. Such samples represent speech and cough audio from both healthy and infected speakers. The COVID-19 Speech Sub-Challenge (CSS) offers 3.24 hours of audio recordings containing speech samples, while the COVID19 Cough Sub-Challenge (CCS) provides 1.63 hours of cough samples. The COVID-19 datasets can be obtained through The University of Cambridge.
How it works
- clone repository
- add cough or speech dataset (dist folder) (see Data section), so that the structure is the following:
CovidSpeechChallenge
|-- dist/
|-- lab/
|-- wav/
|-- features/
|-- results/
|-- src/
|-- vggish/
|-- run_experiments.sh
- run
run_experiments.sh
to do the feature extraction, SVM training and prediction for each features combination
About the code
- The
features/
folder will contain the extracted features after running the feature extraction codes. These are not included in this repository to limit the storage. - The
results/
contains a subfolder for each experiment with the final predictions for devel and test subset (saved as csv file) + plot of the normalized confusion matrix saved as jpg image (only for predictions test subset). This folder of the repository contains the results for the speech audio data. - The
src/
folder contains the python files for all feature extraction and SVMs - The
vggish/
folder contains the code necessary for VGGish and is downloaded from https://modelzoo.co/model/audioset. - Finally
run_experiments.sh
is a bash script that can be modified to only run parts of the experiments.