Instance Reduction for Avoiding Overfitting in Decision Trees

Decision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Amro Asma’, Al-Akhras Mousa, Hindi Khalil El, Habib Mohamed, Shawar Bayan Abu
Formato: article
Lenguaje:EN
Publicado: De Gruyter 2021
Materias:
Q
Acceso en línea:https://doaj.org/article/aa1e6c3d003a415daaa4344d6c9fe55f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:aa1e6c3d003a415daaa4344d6c9fe55f
record_format dspace
spelling oai:doaj.org-article:aa1e6c3d003a415daaa4344d6c9fe55f2021-12-05T14:10:51ZInstance Reduction for Avoiding Overfitting in Decision Trees2191-026X10.1515/jisys-2020-0061https://doaj.org/article/aa1e6c3d003a415daaa4344d6c9fe55f2021-01-01T00:00:00Zhttps://doi.org/10.1515/jisys-2020-0061https://doaj.org/toc/2191-026XDecision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated the use of instance reduction techniques to smooth the decision boundaries before training the decision trees. Noise filters such as ENN, RENN, and ALLKNN remove noisy instances while DROP3 and DROP5 may remove genuine instances. Extensive empirical experiments were conducted on 13 benchmark datasets from UCI machine learning repository with and without intentionally introduced noise. Empirical results show that eliminating border instances improves the classification accuracy of decision trees and reduces the tree size, which reduces the training and classification times. In datasets without intentionally added noise, applying noise filters without the use of the built-in Reduced Error Pruning gave the best classification accuracy. ENN, RENN, and ALLKNN outperformed decision trees learning without pruning in 9, 9, and 8 out of 13 datasets, respectively. The datasets reduced using ENN and RENN without built-in pruning were more effective when noise was intentionally introduced in different ratios.Amro Asma’Al-Akhras MousaHindi Khalil ElHabib MohamedShawar Bayan AbuDe Gruyterarticledecision treesoverfittingpruninginstance reductionnoise filteringScienceQElectronic computers. Computer scienceQA75.5-76.95ENJournal of Intelligent Systems, Vol 30, Iss 1, Pp 438-459 (2021)
institution DOAJ
collection DOAJ
language EN
topic decision trees
overfitting
pruning
instance reduction
noise filtering
Science
Q
Electronic computers. Computer science
QA75.5-76.95
spellingShingle decision trees
overfitting
pruning
instance reduction
noise filtering
Science
Q
Electronic computers. Computer science
QA75.5-76.95
Amro Asma’
Al-Akhras Mousa
Hindi Khalil El
Habib Mohamed
Shawar Bayan Abu
Instance Reduction for Avoiding Overfitting in Decision Trees
description Decision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated the use of instance reduction techniques to smooth the decision boundaries before training the decision trees. Noise filters such as ENN, RENN, and ALLKNN remove noisy instances while DROP3 and DROP5 may remove genuine instances. Extensive empirical experiments were conducted on 13 benchmark datasets from UCI machine learning repository with and without intentionally introduced noise. Empirical results show that eliminating border instances improves the classification accuracy of decision trees and reduces the tree size, which reduces the training and classification times. In datasets without intentionally added noise, applying noise filters without the use of the built-in Reduced Error Pruning gave the best classification accuracy. ENN, RENN, and ALLKNN outperformed decision trees learning without pruning in 9, 9, and 8 out of 13 datasets, respectively. The datasets reduced using ENN and RENN without built-in pruning were more effective when noise was intentionally introduced in different ratios.
format article
author Amro Asma’
Al-Akhras Mousa
Hindi Khalil El
Habib Mohamed
Shawar Bayan Abu
author_facet Amro Asma’
Al-Akhras Mousa
Hindi Khalil El
Habib Mohamed
Shawar Bayan Abu
author_sort Amro Asma’
title Instance Reduction for Avoiding Overfitting in Decision Trees
title_short Instance Reduction for Avoiding Overfitting in Decision Trees
title_full Instance Reduction for Avoiding Overfitting in Decision Trees
title_fullStr Instance Reduction for Avoiding Overfitting in Decision Trees
title_full_unstemmed Instance Reduction for Avoiding Overfitting in Decision Trees
title_sort instance reduction for avoiding overfitting in decision trees
publisher De Gruyter
publishDate 2021
url https://doaj.org/article/aa1e6c3d003a415daaa4344d6c9fe55f
work_keys_str_mv AT amroasma instancereductionforavoidingoverfittingindecisiontrees
AT alakhrasmousa instancereductionforavoidingoverfittingindecisiontrees
AT hindikhalilel instancereductionforavoidingoverfittingindecisiontrees
AT habibmohamed instancereductionforavoidingoverfittingindecisiontrees
AT shawarbayanabu instancereductionforavoidingoverfittingindecisiontrees
_version_ 1718371682836545536