Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.

Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether mach...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Konan Hara, Yasuki Kobayashi, Jun Tomio, Yuki Ito, Thomas Svensson, Ryo Ikesu, Ung-Il Chung, Akiko Kishi Svensson
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/8b08fd73c1ee4e7aac16a7eba522ab9e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8b08fd73c1ee4e7aac16a7eba522ab9e
record_format dspace
spelling oai:doaj.org-article:8b08fd73c1ee4e7aac16a7eba522ab9e2021-12-02T20:14:09ZClaims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.1932-620310.1371/journal.pone.0254394https://doaj.org/article/8b08fd73c1ee4e7aac16a7eba522ab9e2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0254394https://doaj.org/toc/1932-6203Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.Konan HaraYasuki KobayashiJun TomioYuki ItoThomas SvenssonRyo IkesuUng-Il ChungAkiko Kishi SvenssonPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 9, p e0254394 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Konan Hara
Yasuki Kobayashi
Jun Tomio
Yuki Ito
Thomas Svensson
Ryo Ikesu
Ung-Il Chung
Akiko Kishi Svensson
Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
description Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.
format article
author Konan Hara
Yasuki Kobayashi
Jun Tomio
Yuki Ito
Thomas Svensson
Ryo Ikesu
Ung-Il Chung
Akiko Kishi Svensson
author_facet Konan Hara
Yasuki Kobayashi
Jun Tomio
Yuki Ito
Thomas Svensson
Ryo Ikesu
Ung-Il Chung
Akiko Kishi Svensson
author_sort Konan Hara
title Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
title_short Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
title_full Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
title_fullStr Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
title_full_unstemmed Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
title_sort claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/8b08fd73c1ee4e7aac16a7eba522ab9e
work_keys_str_mv AT konanhara claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT yasukikobayashi claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT juntomio claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT yukiito claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT thomassvensson claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT ryoikesu claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT ungilchung claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
AT akikokishisvensson claimsbasedalgorithmsforcommonchronicconditionswereefficientlyconstructedusingmachinelearningmethods
_version_ 1718374718039392256