ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION

The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To han...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ariani Indrawati, Hendro Subagyo, Andre Sihombing, Wagiyah Wagiyah, Sjaeful Afandi
Formato: article
Lenguaje:EN
ID
Publicado: Lembaga Ilmu Pengetahuan Indonesia 2020
Materias:
Z
Acceso en línea:https://doaj.org/article/bf6a90b54cc34ae89e9eee9d4ad072ff
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:bf6a90b54cc34ae89e9eee9d4ad072ff
record_format dspace
spelling oai:doaj.org-article:bf6a90b54cc34ae89e9eee9d4ad072ff2021-12-02T18:49:59ZANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION0125-90082301-859310.14203/j.baca.v41i2.702https://doaj.org/article/bf6a90b54cc34ae89e9eee9d4ad072ff2020-12-01T00:00:00Zhttps://jurnalbaca.pdii.lipi.go.id/index.php/baca/article/view/702https://doaj.org/toc/0125-9008https://doaj.org/toc/2301-8593The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions.Ariani IndrawatiHendro SubagyoAndre SihombingWagiyah WagiyahSjaeful AfandiLembaga Ilmu Pengetahuan Indonesiaarticleimbalanced dataresampling techniquesmachine learningclassificationjournalisjdBibliography. Library science. Information resourcesZENIDBaca: Jurnal Dokumentasi dan Informasi, Vol 41, Iss 2, Pp 133-141 (2020)
institution DOAJ
collection DOAJ
language EN
ID
topic imbalanced data
resampling techniques
machine learning
classification
journal
isjd
Bibliography. Library science. Information resources
Z
spellingShingle imbalanced data
resampling techniques
machine learning
classification
journal
isjd
Bibliography. Library science. Information resources
Z
Ariani Indrawati
Hendro Subagyo
Andre Sihombing
Wagiyah Wagiyah
Sjaeful Afandi
ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
description The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions.
format article
author Ariani Indrawati
Hendro Subagyo
Andre Sihombing
Wagiyah Wagiyah
Sjaeful Afandi
author_facet Ariani Indrawati
Hendro Subagyo
Andre Sihombing
Wagiyah Wagiyah
Sjaeful Afandi
author_sort Ariani Indrawati
title ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
title_short ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
title_full ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
title_fullStr ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
title_full_unstemmed ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
title_sort analyzing the impact of resampling method for imbalanced data text in indonesian scientific articles categorization
publisher Lembaga Ilmu Pengetahuan Indonesia
publishDate 2020
url https://doaj.org/article/bf6a90b54cc34ae89e9eee9d4ad072ff
work_keys_str_mv AT arianiindrawati analyzingtheimpactofresamplingmethodforimbalanceddatatextinindonesianscientificarticlescategorization
AT hendrosubagyo analyzingtheimpactofresamplingmethodforimbalanceddatatextinindonesianscientificarticlescategorization
AT andresihombing analyzingtheimpactofresamplingmethodforimbalanceddatatextinindonesianscientificarticlescategorization
AT wagiyahwagiyah analyzingtheimpactofresamplingmethodforimbalanceddatatextinindonesianscientificarticlescategorization
AT sjaefulafandi analyzingtheimpactofresamplingmethodforimbalanceddatatextinindonesianscientificarticlescategorization
_version_ 1718377529555812352