Effect of data leakage in brain MRI classification using 2D convolutional neural networks

Abstract In recent years, 2D convolutional neural networks (CNNs) have been extensively used to diagnose neurological diseases from magnetic resonance imaging (MRI) data due to their potential to discern subtle and intricate patterns. Despite the high performances reported in numerous studies, devel...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ekin Yagis, Selamawet Workalemahu Atnafu, Alba García Seco de Herrera, Chiara Marzi, Riccardo Scheda, Marco Giannelli, Carlo Tessa, Luca Citi, Stefano Diciotti
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/ba8c3daef34b44d98cdadf6e89dba3e3
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ba8c3daef34b44d98cdadf6e89dba3e3
record_format dspace
spelling oai:doaj.org-article:ba8c3daef34b44d98cdadf6e89dba3e32021-11-21T12:19:02ZEffect of data leakage in brain MRI classification using 2D convolutional neural networks10.1038/s41598-021-01681-w2045-2322https://doaj.org/article/ba8c3daef34b44d98cdadf6e89dba3e32021-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-01681-whttps://doaj.org/toc/2045-2322Abstract In recent years, 2D convolutional neural networks (CNNs) have been extensively used to diagnose neurological diseases from magnetic resonance imaging (MRI) data due to their potential to discern subtle and intricate patterns. Despite the high performances reported in numerous studies, developing CNN models with good generalization abilities is still a challenging task due to possible data leakage introduced during cross-validation (CV). In this study, we quantitatively assessed the effect of a data leakage caused by 3D MRI data splitting based on a 2D slice-level using three 2D CNN models to classify patients with Alzheimer’s disease (AD) and Parkinson’s disease (PD). Our experiments showed that slice-level CV erroneously boosted the average slice level accuracy on the test set by 30% on Open Access Series of Imaging Studies (OASIS), 29% on Alzheimer’s Disease Neuroimaging Initiative (ADNI), 48% on Parkinson’s Progression Markers Initiative (PPMI) and 55% on a local de-novo PD Versilia dataset. Further tests on a randomly labeled OASIS-derived dataset produced about 96% of (erroneous) accuracy (slice-level split) and 50% accuracy (subject-level split), as expected from a randomized experiment. Overall, the extent of the effect of an erroneous slice-based CV is severe, especially for small datasets.Ekin YagisSelamawet Workalemahu AtnafuAlba García Seco de HerreraChiara MarziRiccardo SchedaMarco GiannelliCarlo TessaLuca CitiStefano DiciottiNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-13 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ekin Yagis
Selamawet Workalemahu Atnafu
Alba García Seco de Herrera
Chiara Marzi
Riccardo Scheda
Marco Giannelli
Carlo Tessa
Luca Citi
Stefano Diciotti
Effect of data leakage in brain MRI classification using 2D convolutional neural networks
description Abstract In recent years, 2D convolutional neural networks (CNNs) have been extensively used to diagnose neurological diseases from magnetic resonance imaging (MRI) data due to their potential to discern subtle and intricate patterns. Despite the high performances reported in numerous studies, developing CNN models with good generalization abilities is still a challenging task due to possible data leakage introduced during cross-validation (CV). In this study, we quantitatively assessed the effect of a data leakage caused by 3D MRI data splitting based on a 2D slice-level using three 2D CNN models to classify patients with Alzheimer’s disease (AD) and Parkinson’s disease (PD). Our experiments showed that slice-level CV erroneously boosted the average slice level accuracy on the test set by 30% on Open Access Series of Imaging Studies (OASIS), 29% on Alzheimer’s Disease Neuroimaging Initiative (ADNI), 48% on Parkinson’s Progression Markers Initiative (PPMI) and 55% on a local de-novo PD Versilia dataset. Further tests on a randomly labeled OASIS-derived dataset produced about 96% of (erroneous) accuracy (slice-level split) and 50% accuracy (subject-level split), as expected from a randomized experiment. Overall, the extent of the effect of an erroneous slice-based CV is severe, especially for small datasets.
format article
author Ekin Yagis
Selamawet Workalemahu Atnafu
Alba García Seco de Herrera
Chiara Marzi
Riccardo Scheda
Marco Giannelli
Carlo Tessa
Luca Citi
Stefano Diciotti
author_facet Ekin Yagis
Selamawet Workalemahu Atnafu
Alba García Seco de Herrera
Chiara Marzi
Riccardo Scheda
Marco Giannelli
Carlo Tessa
Luca Citi
Stefano Diciotti
author_sort Ekin Yagis
title Effect of data leakage in brain MRI classification using 2D convolutional neural networks
title_short Effect of data leakage in brain MRI classification using 2D convolutional neural networks
title_full Effect of data leakage in brain MRI classification using 2D convolutional neural networks
title_fullStr Effect of data leakage in brain MRI classification using 2D convolutional neural networks
title_full_unstemmed Effect of data leakage in brain MRI classification using 2D convolutional neural networks
title_sort effect of data leakage in brain mri classification using 2d convolutional neural networks
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/ba8c3daef34b44d98cdadf6e89dba3e3
work_keys_str_mv AT ekinyagis effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT selamawetworkalemahuatnafu effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT albagarciasecodeherrera effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT chiaramarzi effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT riccardoscheda effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT marcogiannelli effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT carlotessa effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT lucaciti effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
AT stefanodiciotti effectofdataleakageinbrainmriclassificationusing2dconvolutionalneuralnetworks
_version_ 1718419090028101632