A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlap...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/a2a5e7d3a4c74626ac5f5226553aa6ff |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:a2a5e7d3a4c74626ac5f5226553aa6ff |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:a2a5e7d3a4c74626ac5f5226553aa6ff2021-11-07T00:33:18ZA Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering1330-36511848-6339https://doaj.org/article/a2a5e7d3a4c74626ac5f5226553aa6ff2021-01-01T00:00:00Zhttps://hrcak.srce.hr/file/383542https://doaj.org/toc/1330-3651https://doaj.org/toc/1848-6339Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.Jie Cao*Yong ShiFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek articleclassificationdensity peaks clusteringimbalanced datasetsover samplingEngineering (General). Civil engineering (General)TA1-2040ENTehnički Vjesnik, Vol 28, Iss 6, Pp 1813-1819 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
classification density peaks clustering imbalanced datasets over sampling Engineering (General). Civil engineering (General) TA1-2040 |
spellingShingle |
classification density peaks clustering imbalanced datasets over sampling Engineering (General). Civil engineering (General) TA1-2040 Jie Cao* Yong Shi A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering |
description |
Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data. |
format |
article |
author |
Jie Cao* Yong Shi |
author_facet |
Jie Cao* Yong Shi |
author_sort |
Jie Cao* |
title |
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering |
title_short |
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering |
title_full |
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering |
title_fullStr |
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering |
title_full_unstemmed |
A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering |
title_sort |
novel oversampling method for imbalanced datasets based on density peaks clustering |
publisher |
Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek |
publishDate |
2021 |
url |
https://doaj.org/article/a2a5e7d3a4c74626ac5f5226553aa6ff |
work_keys_str_mv |
AT jiecao anoveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering AT yongshi anoveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering AT jiecao noveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering AT yongshi noveloversamplingmethodforimbalanceddatasetsbasedondensitypeaksclustering |
_version_ |
1718443630350303232 |