Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models

Objective A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. Meth...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Phillip Richter-Pechanski, Nicolas A Geis, Christina Kiriakou, Dominic M Schwab, Christoph Dieterich
Formato: article
Lenguaje:EN
Publicado: SAGE Publishing 2021
Materias:
Acceso en línea:https://doaj.org/article/d41227b45cd844759a6de54b16972fe7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d41227b45cd844759a6de54b16972fe7
record_format dspace
spelling oai:doaj.org-article:d41227b45cd844759a6de54b16972fe72021-12-01T00:04:02ZAutomatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models2055-207610.1177/20552076211057662https://doaj.org/article/d41227b45cd844759a6de54b16972fe72021-11-01T00:00:00Zhttps://doi.org/10.1177/20552076211057662https://doaj.org/toc/2055-2076Objective A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.Phillip Richter-PechanskiNicolas A GeisChristina KiriakouDominic M SchwabChristoph DieterichSAGE PublishingarticleComputer applications to medicine. Medical informaticsR858-859.7ENDigital Health, Vol 7 (2021)
institution DOAJ
collection DOAJ
language EN
topic Computer applications to medicine. Medical informatics
R858-859.7
spellingShingle Computer applications to medicine. Medical informatics
R858-859.7
Phillip Richter-Pechanski
Nicolas A Geis
Christina Kiriakou
Dominic M Schwab
Christoph Dieterich
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
description Objective A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.
format article
author Phillip Richter-Pechanski
Nicolas A Geis
Christina Kiriakou
Dominic M Schwab
Christoph Dieterich
author_facet Phillip Richter-Pechanski
Nicolas A Geis
Christina Kiriakou
Dominic M Schwab
Christoph Dieterich
author_sort Phillip Richter-Pechanski
title Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_short Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_full Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_fullStr Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_full_unstemmed Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
title_sort automatic extraction of 12 cardiovascular concepts from german discharge letters using pre-trained language models
publisher SAGE Publishing
publishDate 2021
url https://doaj.org/article/d41227b45cd844759a6de54b16972fe7
work_keys_str_mv AT philliprichterpechanski automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT nicolasageis automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT christinakiriakou automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT dominicmschwab automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
AT christophdieterich automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels
_version_ 1718406166865772544