Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cindy Feng, George Kephart, Elizabeth Juarez-Colunga
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/e95c431b7b8c4736b3daa26d7bd17fa1
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e95c431b7b8c4736b3daa26d7bd17fa1
record_format dspace
spelling oai:doaj.org-article:e95c431b7b8c4736b3daa26d7bd17fa12021-11-28T12:38:47ZPredicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods10.1186/s12874-021-01441-41471-2288https://doaj.org/article/e95c431b7b8c4736b3daa26d7bd17fa12021-11-01T00:00:00Zhttps://doi.org/10.1186/s12874-021-01441-4https://doaj.org/toc/1471-2288Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.Cindy FengGeorge KephartElizabeth Juarez-ColungaBMCarticleCOVID-19 mortalityPredictive modelGeneralized additive modelClassification treesExtreme gradient boostingMedicine (General)R5-920ENBMC Medical Research Methodology, Vol 21, Iss 1, Pp 1-14 (2021)
institution DOAJ
collection DOAJ
language EN
topic COVID-19 mortality
Predictive model
Generalized additive model
Classification trees
Extreme gradient boosting
Medicine (General)
R5-920
spellingShingle COVID-19 mortality
Predictive model
Generalized additive model
Classification trees
Extreme gradient boosting
Medicine (General)
R5-920
Cindy Feng
George Kephart
Elizabeth Juarez-Colunga
Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
description Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.
format article
author Cindy Feng
George Kephart
Elizabeth Juarez-Colunga
author_facet Cindy Feng
George Kephart
Elizabeth Juarez-Colunga
author_sort Cindy Feng
title Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_short Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_full Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_fullStr Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_full_unstemmed Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_sort predicting covid-19 mortality risk in toronto, canada: a comparison of tree-based and regression-based machine learning methods
publisher BMC
publishDate 2021
url https://doaj.org/article/e95c431b7b8c4736b3daa26d7bd17fa1
work_keys_str_mv AT cindyfeng predictingcovid19mortalityriskintorontocanadaacomparisonoftreebasedandregressionbasedmachinelearningmethods
AT georgekephart predictingcovid19mortalityriskintorontocanadaacomparisonoftreebasedandregressionbasedmachinelearningmethods
AT elizabethjuarezcolunga predictingcovid19mortalityriskintorontocanadaacomparisonoftreebasedandregressionbasedmachinelearningmethods
_version_ 1718407865551552512