Ensemble machine learning of factors influencing COVID-19 across US counties

Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: David McCoy, Whitney Mgbara, Nir Horvitz, Wayne M. Getz, Alan Hubbard
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/5627ff3f0a3d4df9973fe0c97238104f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:5627ff3f0a3d4df9973fe0c97238104f
record_format dspace
spelling oai:doaj.org-article:5627ff3f0a3d4df9973fe0c97238104f2021-12-02T17:51:20ZEnsemble machine learning of factors influencing COVID-19 across US counties10.1038/s41598-021-90827-x2045-2322https://doaj.org/article/5627ff3f0a3d4df9973fe0c97238104f2021-06-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-90827-xhttps://doaj.org/toc/2045-2322Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities.David McCoyWhitney MgbaraNir HorvitzWayne M. GetzAlan HubbardNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-14 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
David McCoy
Whitney Mgbara
Nir Horvitz
Wayne M. Getz
Alan Hubbard
Ensemble machine learning of factors influencing COVID-19 across US counties
description Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities.
format article
author David McCoy
Whitney Mgbara
Nir Horvitz
Wayne M. Getz
Alan Hubbard
author_facet David McCoy
Whitney Mgbara
Nir Horvitz
Wayne M. Getz
Alan Hubbard
author_sort David McCoy
title Ensemble machine learning of factors influencing COVID-19 across US counties
title_short Ensemble machine learning of factors influencing COVID-19 across US counties
title_full Ensemble machine learning of factors influencing COVID-19 across US counties
title_fullStr Ensemble machine learning of factors influencing COVID-19 across US counties
title_full_unstemmed Ensemble machine learning of factors influencing COVID-19 across US counties
title_sort ensemble machine learning of factors influencing covid-19 across us counties
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/5627ff3f0a3d4df9973fe0c97238104f
work_keys_str_mv AT davidmccoy ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT whitneymgbara ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT nirhorvitz ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT waynemgetz ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT alanhubbard ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
_version_ 1718379285650079744