An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction

Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka....

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Haoyu Luo, Heng Dai, Weiqiang Peng, Wenhua Hu, Fuyang Li
Formato:	article
Lenguaje:	EN
Publicado:	MDPI AG 2021
Materias:	fault prediction machine learning data selection Chemical technology TP1-1185
Acceso en línea:	https://doaj.org/article/59a58f66c75b4c9ba2125e2709250132
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:59a58f66c75b4c9ba2125e2709250132
record_format	dspace
spelling	oai:doaj.org-article:59a58f66c75b4c9ba2125e27092501322021-11-25T18:57:14ZAn Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction10.3390/s212275351424-8220https://doaj.org/article/59a58f66c75b4c9ba2125e27092501322021-11-01T00:00:00Zhttps://www.mdpi.com/1424-8220/21/22/7535https://doaj.org/toc/1424-8220Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models, we examined the practical effects of nine training data selection methods, including a global filter, which does not filter out any cross-project data. Additionally, the prediction performances of ROCPDP models trained on the filtered cross-project data using the training data selection methods were compared with those of ranking-oriented within-project defect prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven available defect datasets from the industrial projects were considered and evaluated using two ranking performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant differences among these nine training data selection methods in terms of FPA and Norm(Popt). The performances of ROCPDP models trained on filtered cross-project data were not comparable with those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP models trained on filtered cross-project data achieved better performance values than ROWPDP models trained on limited historical within-project data. Therefore, we recommended that software quality teams exploit other project datasets to perform ROCPDP when there is no or limited within-project data.Haoyu LuoHeng DaiWeiqiang PengWenhua HuFuyang LiMDPI AGarticlefault predictionmachine learningdata selectionChemical technologyTP1-1185ENSensors, Vol 21, Iss 7535, p 7535 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	fault prediction machine learning data selection Chemical technology TP1-1185
spellingShingle	fault prediction machine learning data selection Chemical technology TP1-1185 Haoyu Luo Heng Dai Weiqiang Peng Wenhua Hu Fuyang Li An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
description	Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models, we examined the practical effects of nine training data selection methods, including a global filter, which does not filter out any cross-project data. Additionally, the prediction performances of ROCPDP models trained on the filtered cross-project data using the training data selection methods were compared with those of ranking-oriented within-project defect prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven available defect datasets from the industrial projects were considered and evaluated using two ranking performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant differences among these nine training data selection methods in terms of FPA and Norm(Popt). The performances of ROCPDP models trained on filtered cross-project data were not comparable with those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP models trained on filtered cross-project data achieved better performance values than ROWPDP models trained on limited historical within-project data. Therefore, we recommended that software quality teams exploit other project datasets to perform ROCPDP when there is no or limited within-project data.
format	article
author	Haoyu Luo Heng Dai Weiqiang Peng Wenhua Hu Fuyang Li
author_facet	Haoyu Luo Heng Dai Weiqiang Peng Wenhua Hu Fuyang Li
author_sort	Haoyu Luo
title	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_short	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_full	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_fullStr	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_full_unstemmed	An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
title_sort	empirical study of training data selection methods for ranking-oriented cross-project defect prediction
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/59a58f66c75b4c9ba2125e2709250132
work_keys_str_mv	AT haoyuluo anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT hengdai anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT weiqiangpeng anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT wenhuahu anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT fuyangli anempiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT haoyuluo empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT hengdai empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT weiqiangpeng empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT wenhuahu empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction AT fuyangli empiricalstudyoftrainingdataselectionmethodsforrankingorientedcrossprojectdefectprediction
_version_	1718410524203417600

An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction

Ejemplares similares