Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Abstract Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with l...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, Degui Zhi
Formato:	article
Lenguaje:	EN
Publicado:	Nature Portfolio 2021
Materias:	Computer applications to medicine. Medical informatics R858-859.7
Acceso en línea:	https://doaj.org/article/14d44497dee74dfdb722302b6ea95c47
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:14d44497dee74dfdb722302b6ea95c47
record_format	dspace
spelling	oai:doaj.org-article:14d44497dee74dfdb722302b6ea95c472021-12-02T16:51:31ZMed-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction10.1038/s41746-021-00455-y2398-6352https://doaj.org/article/14d44497dee74dfdb722302b6ea95c472021-05-01T00:00:00Zhttps://doi.org/10.1038/s41746-021-00455-yhttps://doaj.org/toc/2398-6352Abstract Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21–6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.Laila RasmyYang XiangZiqian XieCui TaoDegui ZhiNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 4, Iss 1, Pp 1-13 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Computer applications to medicine. Medical informatics R858-859.7
spellingShingle	Computer applications to medicine. Medical informatics R858-859.7 Laila Rasmy Yang Xiang Ziqian Xie Cui Tao Degui Zhi Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
description	Abstract Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21–6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.
format	article
author	Laila Rasmy Yang Xiang Ziqian Xie Cui Tao Degui Zhi
author_facet	Laila Rasmy Yang Xiang Ziqian Xie Cui Tao Degui Zhi
author_sort	Laila Rasmy
title	Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
title_short	Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
title_full	Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
title_fullStr	Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
title_full_unstemmed	Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
title_sort	med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
publisher	Nature Portfolio
publishDate	2021
url	https://doaj.org/article/14d44497dee74dfdb722302b6ea95c47
work_keys_str_mv	AT lailarasmy medbertpretrainedcontextualizedembeddingsonlargescalestructuredelectronichealthrecordsfordiseaseprediction AT yangxiang medbertpretrainedcontextualizedembeddingsonlargescalestructuredelectronichealthrecordsfordiseaseprediction AT ziqianxie medbertpretrainedcontextualizedembeddingsonlargescalestructuredelectronichealthrecordsfordiseaseprediction AT cuitao medbertpretrainedcontextualizedembeddingsonlargescalestructuredelectronichealthrecordsfordiseaseprediction AT deguizhi medbertpretrainedcontextualizedembeddingsonlargescalestructuredelectronichealthrecordsfordiseaseprediction
_version_	1718382904238669824

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Ejemplares similares