Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data

Abstract Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzhei...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ji Hwan Park, Han Eol Cho, Jong Hun Kim, Melanie M. Wall, Yaakov Stern, Hyunsun Lim, Shinjae Yoo, Hyoung Seop Kim, Jiook Cha
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2020
Materias:
Acceso en línea:https://doaj.org/article/97d73b0105684ab094538dfc73a2f603
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:97d73b0105684ab094538dfc73a2f603
record_format dspace
spelling oai:doaj.org-article:97d73b0105684ab094538dfc73a2f6032021-12-02T14:02:55ZMachine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data10.1038/s41746-020-0256-02398-6352https://doaj.org/article/97d73b0105684ab094538dfc73a2f6032020-03-01T00:00:00Zhttps://doi.org/10.1038/s41746-020-0256-0https://doaj.org/toc/2398-6352Abstract Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.Ji Hwan ParkHan Eol ChoJong Hun KimMelanie M. WallYaakov SternHyunsun LimShinjae YooHyoung Seop KimJiook ChaNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 3, Iss 1, Pp 1-7 (2020)
institution DOAJ
collection DOAJ
language EN
topic Computer applications to medicine. Medical informatics
R858-859.7
spellingShingle Computer applications to medicine. Medical informatics
R858-859.7
Ji Hwan Park
Han Eol Cho
Jong Hun Kim
Melanie M. Wall
Yaakov Stern
Hyunsun Lim
Shinjae Yoo
Hyoung Seop Kim
Jiook Cha
Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
description Abstract Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.
format article
author Ji Hwan Park
Han Eol Cho
Jong Hun Kim
Melanie M. Wall
Yaakov Stern
Hyunsun Lim
Shinjae Yoo
Hyoung Seop Kim
Jiook Cha
author_facet Ji Hwan Park
Han Eol Cho
Jong Hun Kim
Melanie M. Wall
Yaakov Stern
Hyunsun Lim
Shinjae Yoo
Hyoung Seop Kim
Jiook Cha
author_sort Ji Hwan Park
title Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_short Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_full Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_fullStr Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_full_unstemmed Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_sort machine learning prediction of incidence of alzheimer’s disease using large-scale administrative health data
publisher Nature Portfolio
publishDate 2020
url https://doaj.org/article/97d73b0105684ab094538dfc73a2f603
work_keys_str_mv AT jihwanpark machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT haneolcho machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT jonghunkim machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT melaniemwall machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT yaakovstern machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT hyunsunlim machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT shinjaeyoo machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT hyoungseopkim machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT jiookcha machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
_version_ 1718392096549765120