A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlap...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jie Cao*, Yong Shi
Formato: article
Lenguaje:EN
Publicado: Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2021
Materias:
Acceso en línea:https://doaj.org/article/a2a5e7d3a4c74626ac5f5226553aa6ff
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a2a5e7d3a4c74626ac5f5226553aa6ff
record_format dspace
spelling oai:doaj.org-article:a2a5e7d3a4c74626ac5f5226553aa6ff2021-11-07T00:33:18ZA Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering1330-36511848-6339https://doaj.org/article/a2a5e7d3a4c74626ac5f5226553aa6ff2021-01-01T00:00:00Zhttps://hrcak.srce.hr/file/383542https://doaj.org/toc/1330-3651https://doaj.org/toc/1848-6339Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.Jie Cao*Yong ShiFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek articleclassificationdensity peaks clusteringimbalanced datasetsover samplingEngineering (General). Civil engineering (General)TA1-2040ENTehnički Vjesnik, Vol 28, Iss 6, Pp 1813-1819 (2021)
institution DOAJ
collection DOAJ
language EN
topic classification
density peaks clustering
imbalanced datasets
over sampling
Engineering (General). Civil engineering (General)
TA1-2040
spellingShingle classification
density peaks clustering
imbalanced datasets
over sampling
Engineering (General). Civil engineering (General)
TA1-2040
Jie Cao*
Yong Shi
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
description Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.
format article
author Jie Cao*
Yong Shi
author_facet Jie Cao*
Yong Shi
author_sort Jie Cao*
title A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_short A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_full A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_fullStr A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_full_unstemmed A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
title_sort novel oversampling method for imbalanced datasets based on density peaks clustering
publisher Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
publishDate 2021
url https://doaj.org/article/a2a5e7d3a4c74626ac5f5226553aa6ff
work_keys_str_mv AT jiecao anoveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
AT yongshi anoveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
AT jiecao noveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
AT yongshi noveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering
_version_ 1718443630350303232