Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data

Abstract As a leading cause of death and morbidity, heart failure (HF) is responsible for a large portion of healthcare and disability costs worldwide. Current approaches to define specific HF subpopulations may fail to account for the diversity of etiologies, comorbidities, and factors driving dise...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Tasha Nagamine, Brian Gillette, Alexey Pakhomov, John Kahoun, Hannah Mayer, Rolf Burghaus, Jörg Lippert, Mayur Saxena
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2020
Materias:
R
Q
Acceso en línea:https://doaj.org/article/2355c0f45efd42518c247a2a9b65c245
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:2355c0f45efd42518c247a2a9b65c245
record_format dspace
spelling oai:doaj.org-article:2355c0f45efd42518c247a2a9b65c2452021-12-02T12:33:44ZMultiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data10.1038/s41598-020-77286-62045-2322https://doaj.org/article/2355c0f45efd42518c247a2a9b65c2452020-12-01T00:00:00Zhttps://doi.org/10.1038/s41598-020-77286-6https://doaj.org/toc/2045-2322Abstract As a leading cause of death and morbidity, heart failure (HF) is responsible for a large portion of healthcare and disability costs worldwide. Current approaches to define specific HF subpopulations may fail to account for the diversity of etiologies, comorbidities, and factors driving disease progression, and therefore have limited value for clinical decision making and development of novel therapies. Here we present a novel and data-driven approach to understand and characterize the real-world manifestation of HF by clustering disease and symptom-related clinical concepts (complaints) captured from unstructured electronic health record clinical notes. We used natural language processing to construct vectorized representations of patient complaints followed by clustering to group HF patients by similarity of complaint vectors. We then identified complaints that were significantly enriched within each cluster using statistical testing. Breaking the HF population into groups of similar patients revealed a clinically interpretable hierarchy of subgroups characterized by similar HF manifestation. Importantly, our methodology revealed well-known etiologies, risk factors, and comorbid conditions of HF (including ischemic heart disease, aortic valve disease, atrial fibrillation, congenital heart disease, various cardiomyopathies, obesity, hypertension, diabetes, and chronic kidney disease) and yielded additional insights into the details of each HF subgroup’s clinical manifestation of HF. Our approach is entirely hypothesis free and can therefore be readily applied for discovery of novel insights in alternative diseases or patient populations.Tasha NagamineBrian GilletteAlexey PakhomovJohn KahounHannah MayerRolf BurghausJörg LippertMayur SaxenaNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 10, Iss 1, Pp 1-13 (2020)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Tasha Nagamine
Brian Gillette
Alexey Pakhomov
John Kahoun
Hannah Mayer
Rolf Burghaus
Jörg Lippert
Mayur Saxena
Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
description Abstract As a leading cause of death and morbidity, heart failure (HF) is responsible for a large portion of healthcare and disability costs worldwide. Current approaches to define specific HF subpopulations may fail to account for the diversity of etiologies, comorbidities, and factors driving disease progression, and therefore have limited value for clinical decision making and development of novel therapies. Here we present a novel and data-driven approach to understand and characterize the real-world manifestation of HF by clustering disease and symptom-related clinical concepts (complaints) captured from unstructured electronic health record clinical notes. We used natural language processing to construct vectorized representations of patient complaints followed by clustering to group HF patients by similarity of complaint vectors. We then identified complaints that were significantly enriched within each cluster using statistical testing. Breaking the HF population into groups of similar patients revealed a clinically interpretable hierarchy of subgroups characterized by similar HF manifestation. Importantly, our methodology revealed well-known etiologies, risk factors, and comorbid conditions of HF (including ischemic heart disease, aortic valve disease, atrial fibrillation, congenital heart disease, various cardiomyopathies, obesity, hypertension, diabetes, and chronic kidney disease) and yielded additional insights into the details of each HF subgroup’s clinical manifestation of HF. Our approach is entirely hypothesis free and can therefore be readily applied for discovery of novel insights in alternative diseases or patient populations.
format article
author Tasha Nagamine
Brian Gillette
Alexey Pakhomov
John Kahoun
Hannah Mayer
Rolf Burghaus
Jörg Lippert
Mayur Saxena
author_facet Tasha Nagamine
Brian Gillette
Alexey Pakhomov
John Kahoun
Hannah Mayer
Rolf Burghaus
Jörg Lippert
Mayur Saxena
author_sort Tasha Nagamine
title Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
title_short Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
title_full Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
title_fullStr Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
title_full_unstemmed Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
title_sort multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data
publisher Nature Portfolio
publishDate 2020
url https://doaj.org/article/2355c0f45efd42518c247a2a9b65c245
work_keys_str_mv AT tashanagamine multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT briangillette multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT alexeypakhomov multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT johnkahoun multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT hannahmayer multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT rolfburghaus multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT jorglippert multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
AT mayursaxena multiscaleclassificationofheartfailurephenotypesbyunsupervisedclusteringofunstructuredelectronicmedicalrecorddata
_version_ 1718393847748231168