Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis

Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better p...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sara Ibrahim, Saima Nazir, Sergio A. Velastin
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/5da96af79c0347d49ebcce66b62ce0ff
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:5da96af79c0347d49ebcce66b62ce0ff
record_format dspace
spelling oai:doaj.org-article:5da96af79c0347d49ebcce66b62ce0ff2021-11-25T18:03:25ZFeature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis10.3390/jimaging71102252313-433Xhttps://doaj.org/article/5da96af79c0347d49ebcce66b62ce0ff2021-10-01T00:00:00Zhttps://www.mdpi.com/2313-433X/7/11/225https://doaj.org/toc/2313-433XBreast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better performance are very important for personalized care and treatment and to reduce and control the recurrence of cancer. The main objective of this research was to select feature selection techniques using correlation analysis and variance of input features before passing these significant features to a classification method. We used an ensemble method to improve the classification of breast cancer. The proposed approach was evaluated using the public WBCD dataset (Wisconsin Breast Cancer Dataset). Correlation analysis and principal component analysis were used for dimensionality reduction. Performance was evaluated for well-known machine learning classifiers, and the best seven classifiers were chosen for the next step. Hyper-parameter tuning was performed to improve the performances of the classifiers. The best performing classification algorithms were combined with two different voting techniques. Hard voting predicts the class that gets the majority vote, whereas soft voting predicts the class based on highest probability. The proposed approach performed better than state-of-the-art work, achieving an accuracy of 98.24%, high precision (99.29%) and a recall value of 95.89%.Sara IbrahimSaima NazirSergio A. VelastinMDPI AGarticlebreast cancer diagnosisWisconsin Breast Cancer Datasetfeature selectiondimensionality reductionprincipal component analysisensemble methodPhotographyTR1-1050Computer applications to medicine. Medical informaticsR858-859.7Electronic computers. Computer scienceQA75.5-76.95ENJournal of Imaging, Vol 7, Iss 225, p 225 (2021)
institution DOAJ
collection DOAJ
language EN
topic breast cancer diagnosis
Wisconsin Breast Cancer Dataset
feature selection
dimensionality reduction
principal component analysis
ensemble method
Photography
TR1-1050
Computer applications to medicine. Medical informatics
R858-859.7
Electronic computers. Computer science
QA75.5-76.95
spellingShingle breast cancer diagnosis
Wisconsin Breast Cancer Dataset
feature selection
dimensionality reduction
principal component analysis
ensemble method
Photography
TR1-1050
Computer applications to medicine. Medical informatics
R858-859.7
Electronic computers. Computer science
QA75.5-76.95
Sara Ibrahim
Saima Nazir
Sergio A. Velastin
Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
description Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better performance are very important for personalized care and treatment and to reduce and control the recurrence of cancer. The main objective of this research was to select feature selection techniques using correlation analysis and variance of input features before passing these significant features to a classification method. We used an ensemble method to improve the classification of breast cancer. The proposed approach was evaluated using the public WBCD dataset (Wisconsin Breast Cancer Dataset). Correlation analysis and principal component analysis were used for dimensionality reduction. Performance was evaluated for well-known machine learning classifiers, and the best seven classifiers were chosen for the next step. Hyper-parameter tuning was performed to improve the performances of the classifiers. The best performing classification algorithms were combined with two different voting techniques. Hard voting predicts the class that gets the majority vote, whereas soft voting predicts the class based on highest probability. The proposed approach performed better than state-of-the-art work, achieving an accuracy of 98.24%, high precision (99.29%) and a recall value of 95.89%.
format article
author Sara Ibrahim
Saima Nazir
Sergio A. Velastin
author_facet Sara Ibrahim
Saima Nazir
Sergio A. Velastin
author_sort Sara Ibrahim
title Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_short Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_full Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_fullStr Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_full_unstemmed Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_sort feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/5da96af79c0347d49ebcce66b62ce0ff
work_keys_str_mv AT saraibrahim featureselectionusingcorrelationanalysisandprincipalcomponentanalysisforaccuratebreastcancerdiagnosis
AT saimanazir featureselectionusingcorrelationanalysisandprincipalcomponentanalysisforaccuratebreastcancerdiagnosis
AT sergioavelastin featureselectionusingcorrelationanalysisandprincipalcomponentanalysisforaccuratebreastcancerdiagnosis
_version_ 1718411681819787264