Data-driven identification of ageing-related diseases from electronic health records

Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Valerie Kuan, Helen C. Fraser, Melanie Hingorani, Spiros Denaxas, Arturo Gonzalez-Izquierdo, Kenan Direk, Dorothea Nitsch, Rohini Mathur, Constantinos A. Parisinos, R. Thomas Lumbers, Reecha Sofat, Ian C. K. Wong, Juan P. Casas, Janet M. Thornton, Harry Hemingway, Linda Partridge, Aroon D. Hingorani
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a4cfa63aeff54b99ac644b906f56e8bb
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a4cfa63aeff54b99ac644b906f56e8bb
record_format dspace
spelling oai:doaj.org-article:a4cfa63aeff54b99ac644b906f56e8bb2021-12-02T14:06:55ZData-driven identification of ageing-related diseases from electronic health records10.1038/s41598-021-82459-y2045-2322https://doaj.org/article/a4cfa63aeff54b99ac644b906f56e8bb2021-02-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-82459-yhttps://doaj.org/toc/2045-2322Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify ARDs using two complementary methods consisting of unsupervised machine learning and actuarial techniques, which we applied to electronic health records (EHRs) from 3,009,048 individuals in England using primary care data from the Clinical Practice Research Datalink (CPRD) linked to the Hospital Episode Statistics admitted patient care dataset between 1 April 2010 and 31 March 2015 (mean age 49.7 years (s.d. 18.6), 51% female, 70% white ethnicity). We grouped 278 high-burden diseases into nine main clusters according to their patterns of disease onset, using a hierarchical agglomerative clustering algorithm. Four of these clusters, encompassing 207 diseases spanning diverse organ systems and clinical specialties, had rates of disease onset that clearly increased with chronological age. However, the ages of onset for these four clusters were strikingly different, with median age of onset 82 years (IQR 82–83) for Cluster 1, 77 years (IQR 75–77) for Cluster 2, 69 years (IQR 66–71) for Cluster 3 and 57 years (IQR 54–59) for Cluster 4. Fitting to ageing-related actuarial models confirmed that the vast majority of these 207 diseases had a high probability of being ageing-related. Cardiovascular diseases and cancers were highly represented, while benign neoplastic, skin and psychiatric conditions were largely absent from the four ageing-related clusters. Our framework identifies and clusters ARDs and can form the basis for fundamental and translational research into ageing pathways.Valerie KuanHelen C. FraserMelanie HingoraniSpiros DenaxasArturo Gonzalez-IzquierdoKenan DirekDorothea NitschRohini MathurConstantinos A. ParisinosR. Thomas LumbersReecha SofatIan C. K. WongJuan P. CasasJanet M. ThorntonHarry HemingwayLinda PartridgeAroon D. HingoraniNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-17 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Valerie Kuan
Helen C. Fraser
Melanie Hingorani
Spiros Denaxas
Arturo Gonzalez-Izquierdo
Kenan Direk
Dorothea Nitsch
Rohini Mathur
Constantinos A. Parisinos
R. Thomas Lumbers
Reecha Sofat
Ian C. K. Wong
Juan P. Casas
Janet M. Thornton
Harry Hemingway
Linda Partridge
Aroon D. Hingorani
Data-driven identification of ageing-related diseases from electronic health records
description Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify ARDs using two complementary methods consisting of unsupervised machine learning and actuarial techniques, which we applied to electronic health records (EHRs) from 3,009,048 individuals in England using primary care data from the Clinical Practice Research Datalink (CPRD) linked to the Hospital Episode Statistics admitted patient care dataset between 1 April 2010 and 31 March 2015 (mean age 49.7 years (s.d. 18.6), 51% female, 70% white ethnicity). We grouped 278 high-burden diseases into nine main clusters according to their patterns of disease onset, using a hierarchical agglomerative clustering algorithm. Four of these clusters, encompassing 207 diseases spanning diverse organ systems and clinical specialties, had rates of disease onset that clearly increased with chronological age. However, the ages of onset for these four clusters were strikingly different, with median age of onset 82 years (IQR 82–83) for Cluster 1, 77 years (IQR 75–77) for Cluster 2, 69 years (IQR 66–71) for Cluster 3 and 57 years (IQR 54–59) for Cluster 4. Fitting to ageing-related actuarial models confirmed that the vast majority of these 207 diseases had a high probability of being ageing-related. Cardiovascular diseases and cancers were highly represented, while benign neoplastic, skin and psychiatric conditions were largely absent from the four ageing-related clusters. Our framework identifies and clusters ARDs and can form the basis for fundamental and translational research into ageing pathways.
format article
author Valerie Kuan
Helen C. Fraser
Melanie Hingorani
Spiros Denaxas
Arturo Gonzalez-Izquierdo
Kenan Direk
Dorothea Nitsch
Rohini Mathur
Constantinos A. Parisinos
R. Thomas Lumbers
Reecha Sofat
Ian C. K. Wong
Juan P. Casas
Janet M. Thornton
Harry Hemingway
Linda Partridge
Aroon D. Hingorani
author_facet Valerie Kuan
Helen C. Fraser
Melanie Hingorani
Spiros Denaxas
Arturo Gonzalez-Izquierdo
Kenan Direk
Dorothea Nitsch
Rohini Mathur
Constantinos A. Parisinos
R. Thomas Lumbers
Reecha Sofat
Ian C. K. Wong
Juan P. Casas
Janet M. Thornton
Harry Hemingway
Linda Partridge
Aroon D. Hingorani
author_sort Valerie Kuan
title Data-driven identification of ageing-related diseases from electronic health records
title_short Data-driven identification of ageing-related diseases from electronic health records
title_full Data-driven identification of ageing-related diseases from electronic health records
title_fullStr Data-driven identification of ageing-related diseases from electronic health records
title_full_unstemmed Data-driven identification of ageing-related diseases from electronic health records
title_sort data-driven identification of ageing-related diseases from electronic health records
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/a4cfa63aeff54b99ac644b906f56e8bb
work_keys_str_mv AT valeriekuan datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT helencfraser datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT melaniehingorani datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT spirosdenaxas datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT arturogonzalezizquierdo datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT kenandirek datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT dorotheanitsch datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT rohinimathur datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT constantinosaparisinos datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT rthomaslumbers datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT reechasofat datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT ianckwong datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT juanpcasas datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT janetmthornton datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT harryhemingway datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT lindapartridge datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT aroondhingorani datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
_version_ 1718391981201162240