Neighborhood level chronic respiratory disease prevalence estimation using search query data.

Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Nabeel Abdur Rehman, Scott Counts
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/132ab6e149c44490929f2c1500c8c224
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:132ab6e149c44490929f2c1500c8c224
record_format dspace
spelling oai:doaj.org-article:132ab6e149c44490929f2c1500c8c2242021-12-02T20:10:52ZNeighborhood level chronic respiratory disease prevalence estimation using search query data.1932-620310.1371/journal.pone.0252383https://doaj.org/article/132ab6e149c44490929f2c1500c8c2242021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0252383https://doaj.org/toc/1932-6203Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD.Nabeel Abdur RehmanScott CountsPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 6, p e0252383 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Nabeel Abdur Rehman
Scott Counts
Neighborhood level chronic respiratory disease prevalence estimation using search query data.
description Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD.
format article
author Nabeel Abdur Rehman
Scott Counts
author_facet Nabeel Abdur Rehman
Scott Counts
author_sort Nabeel Abdur Rehman
title Neighborhood level chronic respiratory disease prevalence estimation using search query data.
title_short Neighborhood level chronic respiratory disease prevalence estimation using search query data.
title_full Neighborhood level chronic respiratory disease prevalence estimation using search query data.
title_fullStr Neighborhood level chronic respiratory disease prevalence estimation using search query data.
title_full_unstemmed Neighborhood level chronic respiratory disease prevalence estimation using search query data.
title_sort neighborhood level chronic respiratory disease prevalence estimation using search query data.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/132ab6e149c44490929f2c1500c8c224
work_keys_str_mv AT nabeelabdurrehman neighborhoodlevelchronicrespiratorydiseaseprevalenceestimationusingsearchquerydata
AT scottcounts neighborhoodlevelchronicrespiratorydiseaseprevalenceestimationusingsearchquerydata
_version_ 1718374928709844992