Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy

Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this p...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Giulia Varotto, Gianluca Susi, Laura Tassi, Francesca Gozzo, Silvana Franceschetti, Ferruccio Panzica
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/1b9ae50c9eed4277bc3395713a3db303
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:1b9ae50c9eed4277bc3395713a3db303
record_format dspace
spelling oai:doaj.org-article:1b9ae50c9eed4277bc3395713a3db3032021-11-19T13:20:57ZComparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy1662-519610.3389/fninf.2021.715421https://doaj.org/article/1b9ae50c9eed4277bc3395713a3db3032021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fninf.2021.715421/fullhttps://doaj.org/toc/1662-5196Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery.Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered.Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method.Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome.Giulia VarottoGiulia VarottoGianluca SusiGianluca SusiLaura TassiFrancesca GozzoSilvana FranceschettiFerruccio PanzicaFrontiers Media S.A.articleimbalanced dataset classificationre-sampling techniquesoversampling and undersamplingensemble methodsnetwork analysisepilepsy surgeryNeurosciences. Biological psychiatry. NeuropsychiatryRC321-571ENFrontiers in Neuroinformatics, Vol 15 (2021)
institution DOAJ
collection DOAJ
language EN
topic imbalanced dataset classification
re-sampling techniques
oversampling and undersampling
ensemble methods
network analysis
epilepsy surgery
Neurosciences. Biological psychiatry. Neuropsychiatry
RC321-571
spellingShingle imbalanced dataset classification
re-sampling techniques
oversampling and undersampling
ensemble methods
network analysis
epilepsy surgery
Neurosciences. Biological psychiatry. Neuropsychiatry
RC321-571
Giulia Varotto
Giulia Varotto
Gianluca Susi
Gianluca Susi
Laura Tassi
Francesca Gozzo
Silvana Franceschetti
Ferruccio Panzica
Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
description Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery.Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered.Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method.Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome.
format article
author Giulia Varotto
Giulia Varotto
Gianluca Susi
Gianluca Susi
Laura Tassi
Francesca Gozzo
Silvana Franceschetti
Ferruccio Panzica
author_facet Giulia Varotto
Giulia Varotto
Gianluca Susi
Gianluca Susi
Laura Tassi
Francesca Gozzo
Silvana Franceschetti
Ferruccio Panzica
author_sort Giulia Varotto
title Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_short Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_full Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_fullStr Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_full_unstemmed Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_sort comparison of resampling techniques for imbalanced datasets in machine learning: application to epileptogenic zone localization from interictal intracranial eeg recordings in patients with focal epilepsy
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/1b9ae50c9eed4277bc3395713a3db303
work_keys_str_mv AT giuliavarotto comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT giuliavarotto comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT gianlucasusi comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT gianlucasusi comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT lauratassi comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT francescagozzo comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT silvanafranceschetti comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
AT ferrucciopanzica comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy
_version_ 1718420056566661120