An oversampling method for multi-class imbalanced data based on composite weights.

To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Mingyang Deng, Yingshi Guo, Chang Wang, Fuwei Wu
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/3ec3e187b3304496a3249da02548897d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:3ec3e187b3304496a3249da02548897d
record_format dspace
spelling oai:doaj.org-article:3ec3e187b3304496a3249da02548897d2021-12-02T20:13:16ZAn oversampling method for multi-class imbalanced data based on composite weights.1932-620310.1371/journal.pone.0259227https://doaj.org/article/3ec3e187b3304496a3249da02548897d2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0259227https://doaj.org/toc/1932-6203To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.Mingyang DengYingshi GuoChang WangFuwei WuPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 11, p e0259227 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Mingyang Deng
Yingshi Guo
Chang Wang
Fuwei Wu
An oversampling method for multi-class imbalanced data based on composite weights.
description To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.
format article
author Mingyang Deng
Yingshi Guo
Chang Wang
Fuwei Wu
author_facet Mingyang Deng
Yingshi Guo
Chang Wang
Fuwei Wu
author_sort Mingyang Deng
title An oversampling method for multi-class imbalanced data based on composite weights.
title_short An oversampling method for multi-class imbalanced data based on composite weights.
title_full An oversampling method for multi-class imbalanced data based on composite weights.
title_fullStr An oversampling method for multi-class imbalanced data based on composite weights.
title_full_unstemmed An oversampling method for multi-class imbalanced data based on composite weights.
title_sort oversampling method for multi-class imbalanced data based on composite weights.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/3ec3e187b3304496a3249da02548897d
work_keys_str_mv AT mingyangdeng anoversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT yingshiguo anoversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT changwang anoversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT fuweiwu anoversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT mingyangdeng oversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT yingshiguo oversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT changwang oversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
AT fuweiwu oversamplingmethodformulticlassimbalanceddatabasedoncompositeweights
_version_ 1718374760908324864