Development of glaucoma predictive model and risk factors assessment based on supervised models

Abstract Objectives To develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors. Method Data analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline in...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Mahyar Sharifi, Toktam Khatibi, Mohammad Hassan Emamian, Somayeh Sadat, Hassan Hashemi, Akbar Fotouhi
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/e63fc15ecaad417cb2998d9acf1430de
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e63fc15ecaad417cb2998d9acf1430de
record_format dspace
spelling oai:doaj.org-article:e63fc15ecaad417cb2998d9acf1430de2021-11-28T12:03:54ZDevelopment of glaucoma predictive model and risk factors assessment based on supervised models10.1186/s13040-021-00281-81756-0381https://doaj.org/article/e63fc15ecaad417cb2998d9acf1430de2021-11-01T00:00:00Zhttps://doi.org/10.1186/s13040-021-00281-8https://doaj.org/toc/1756-0381Abstract Objectives To develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors. Method Data analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline include data sampling, preprocessing, classification and evaluation and validation. Data sampling for providing the training dataset was performed with balanced sampling based on over-sampling and under-sampling methods. Data preprocessing steps were missing value imputation and normalization. For classification step, several machine learning models were designed for predicting glaucoma including Decision Trees (DTs), K-Nearest Neighbors (K-NN), Support Vector Machines (SVM), Random Forests (RFs), Extra Trees (ETs) and Bagging Ensemble methods. Moreover, in the classification step, a novel stacking ensemble model is designed and proposed using the superior classifiers. Results The data were from Shahroud Eye Cohort Study including demographic and ophthalmology data for 5190 participants aged 40-64 living in Shahroud, northeast Iran. The main variables considered in this dataset were 67 demographics, ophthalmologic, optometric, perimetry, and biometry features for 4561 people, including 4474 non-glaucoma participants and 87 glaucoma patients. Experimental results show that DTs and RFs trained based on under-sampling of the training dataset have superior performance for predicting glaucoma than the compared single classifiers and bagging ensemble methods with the average accuracy of 87.61 and 88.87, the sensitivity of 73.80 and 72.35, specificity of 87.88 and 89.10 and area under the curve (AUC) of 91.04 and 94.53, respectively. The proposed stacking ensemble has an average accuracy of 83.56, a sensitivity of 82.21, a specificity of 81.32, and an AUC of 88.54. Conclusions In this study, a machine learning model is proposed and developed to predict glaucoma disease among persons aged 40-64. Top predictors in this study considered features for discriminating and predicting non-glaucoma persons from glaucoma patients include the number of the visual field detect on perimetry, vertical cup to disk ratio, white to white diameter, systolic blood pressure, pupil barycenter on Y coordinate, age, and axial length.Mahyar SharifiToktam KhatibiMohammad Hassan EmamianSomayeh SadatHassan HashemiAkbar FotouhiBMCarticleOphthalmologyData MiningImbalanced LearningFeature selectionEnsemble classificationComputer applications to medicine. Medical informaticsR858-859.7AnalysisQA299.6-433ENBioData Mining, Vol 14, Iss 1, Pp 1-15 (2021)
institution DOAJ
collection DOAJ
language EN
topic Ophthalmology
Data Mining
Imbalanced Learning
Feature selection
Ensemble classification
Computer applications to medicine. Medical informatics
R858-859.7
Analysis
QA299.6-433
spellingShingle Ophthalmology
Data Mining
Imbalanced Learning
Feature selection
Ensemble classification
Computer applications to medicine. Medical informatics
R858-859.7
Analysis
QA299.6-433
Mahyar Sharifi
Toktam Khatibi
Mohammad Hassan Emamian
Somayeh Sadat
Hassan Hashemi
Akbar Fotouhi
Development of glaucoma predictive model and risk factors assessment based on supervised models
description Abstract Objectives To develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors. Method Data analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline include data sampling, preprocessing, classification and evaluation and validation. Data sampling for providing the training dataset was performed with balanced sampling based on over-sampling and under-sampling methods. Data preprocessing steps were missing value imputation and normalization. For classification step, several machine learning models were designed for predicting glaucoma including Decision Trees (DTs), K-Nearest Neighbors (K-NN), Support Vector Machines (SVM), Random Forests (RFs), Extra Trees (ETs) and Bagging Ensemble methods. Moreover, in the classification step, a novel stacking ensemble model is designed and proposed using the superior classifiers. Results The data were from Shahroud Eye Cohort Study including demographic and ophthalmology data for 5190 participants aged 40-64 living in Shahroud, northeast Iran. The main variables considered in this dataset were 67 demographics, ophthalmologic, optometric, perimetry, and biometry features for 4561 people, including 4474 non-glaucoma participants and 87 glaucoma patients. Experimental results show that DTs and RFs trained based on under-sampling of the training dataset have superior performance for predicting glaucoma than the compared single classifiers and bagging ensemble methods with the average accuracy of 87.61 and 88.87, the sensitivity of 73.80 and 72.35, specificity of 87.88 and 89.10 and area under the curve (AUC) of 91.04 and 94.53, respectively. The proposed stacking ensemble has an average accuracy of 83.56, a sensitivity of 82.21, a specificity of 81.32, and an AUC of 88.54. Conclusions In this study, a machine learning model is proposed and developed to predict glaucoma disease among persons aged 40-64. Top predictors in this study considered features for discriminating and predicting non-glaucoma persons from glaucoma patients include the number of the visual field detect on perimetry, vertical cup to disk ratio, white to white diameter, systolic blood pressure, pupil barycenter on Y coordinate, age, and axial length.
format article
author Mahyar Sharifi
Toktam Khatibi
Mohammad Hassan Emamian
Somayeh Sadat
Hassan Hashemi
Akbar Fotouhi
author_facet Mahyar Sharifi
Toktam Khatibi
Mohammad Hassan Emamian
Somayeh Sadat
Hassan Hashemi
Akbar Fotouhi
author_sort Mahyar Sharifi
title Development of glaucoma predictive model and risk factors assessment based on supervised models
title_short Development of glaucoma predictive model and risk factors assessment based on supervised models
title_full Development of glaucoma predictive model and risk factors assessment based on supervised models
title_fullStr Development of glaucoma predictive model and risk factors assessment based on supervised models
title_full_unstemmed Development of glaucoma predictive model and risk factors assessment based on supervised models
title_sort development of glaucoma predictive model and risk factors assessment based on supervised models
publisher BMC
publishDate 2021
url https://doaj.org/article/e63fc15ecaad417cb2998d9acf1430de
work_keys_str_mv AT mahyarsharifi developmentofglaucomapredictivemodelandriskfactorsassessmentbasedonsupervisedmodels
AT toktamkhatibi developmentofglaucomapredictivemodelandriskfactorsassessmentbasedonsupervisedmodels
AT mohammadhassanemamian developmentofglaucomapredictivemodelandriskfactorsassessmentbasedonsupervisedmodels
AT somayehsadat developmentofglaucomapredictivemodelandriskfactorsassessmentbasedonsupervisedmodels
AT hassanhashemi developmentofglaucomapredictivemodelandriskfactorsassessmentbasedonsupervisedmodels
AT akbarfotouhi developmentofglaucomapredictivemodelandriskfactorsassessmentbasedonsupervisedmodels
_version_ 1718408231298007040