Ensemble Deep Learning for the Detection of COVID-19 in Unbalanced Chest X-ray Dataset

The ongoing COVID-19 pandemic has caused devastating effects on humanity worldwide. With practical advantages and wide accessibility, chest X-rays (CXRs) play vital roles in the diagnosis of COVID-19 and the evaluation of the extent of lung damages incurred by the virus. This study aimed to leverage...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Khin Yadanar Win, Noppadol Maneerat, Syna Sreng, Kazuhiko Hamamoto
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/4cb34389d3d5420d848a1865b585639c
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:The ongoing COVID-19 pandemic has caused devastating effects on humanity worldwide. With practical advantages and wide accessibility, chest X-rays (CXRs) play vital roles in the diagnosis of COVID-19 and the evaluation of the extent of lung damages incurred by the virus. This study aimed to leverage deep-learning-based methods toward the automated classification of COVID-19 from normal and viral pneumonia on CXRs, and the identification of indicative regions of COVID-19 biomarkers. Initially, we preprocessed and segmented the lung regions usingDeepLabV3+ method, and subsequently cropped the lung regions. The cropped lung regions were used as inputs to several deep convolutional neural networks (CNNs) for the prediction of COVID-19. The dataset was highly unbalanced; the vast majority were normal images, with a small number of COVID-19 and pneumonia images. To remedy the unbalanced distribution and to avoid biased classification results, we applied five different approaches: (i) balancing the class using weighted loss; (ii) image augmentation to add more images to minority cases; (iii) the undersampling of majority classes; (iv) the oversampling of minority classes; and (v) a hybrid resampling approach of oversampling and undersampling. The best-performing methods from each approach were combined as the ensemble classifier using two voting strategies. Finally, we used the saliency map of CNNs to identify the indicative regions of COVID-19 biomarkers which are deemed useful for interpretability. The algorithms were evaluated using the largest publicly available COVID-19 dataset. An ensemble of the top five CNNs with image augmentation achieved the highest accuracy of 99.23% and area under curve (AUC) of 99.97%, surpassing the results of previous studies.