Instance Reduction for Avoiding Overfitting in Decision Trees

Decision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Amro Asma’, Al-Akhras Mousa, Hindi Khalil El, Habib Mohamed, Shawar Bayan Abu
Format:	article
Langue:	EN
Publié:	De Gruyter 2021
Sujets:	decision trees overfitting pruning instance reduction noise filtering Science Q Electronic computers. Computer science QA75.5-76.95
Accès en ligne:	https://doaj.org/article/aa1e6c3d003a415daaa4344d6c9fe55f
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Description
Résumé:	Decision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated the use of instance reduction techniques to smooth the decision boundaries before training the decision trees. Noise filters such as ENN, RENN, and ALLKNN remove noisy instances while DROP3 and DROP5 may remove genuine instances. Extensive empirical experiments were conducted on 13 benchmark datasets from UCI machine learning repository with and without intentionally introduced noise. Empirical results show that eliminating border instances improves the classification accuracy of decision trees and reduces the tree size, which reduces the training and classification times. In datasets without intentionally added noise, applying noise filters without the use of the built-in Reduced Error Pruning gave the best classification accuracy. ENN, RENN, and ALLKNN outperformed decision trees learning without pruning in 9, 9, and 8 out of 13 datasets, respectively. The datasets reduced using ENN and RENN without built-in pruning were more effective when noise was intentionally introduced in different ratios.

Instance Reduction for Avoiding Overfitting in Decision Trees

Documents similaires