Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models
Objective A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. Meth...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
SAGE Publishing
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/d41227b45cd844759a6de54b16972fe7 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:d41227b45cd844759a6de54b16972fe7 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:d41227b45cd844759a6de54b16972fe72021-12-01T00:04:02ZAutomatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models2055-207610.1177/20552076211057662https://doaj.org/article/d41227b45cd844759a6de54b16972fe72021-11-01T00:00:00Zhttps://doi.org/10.1177/20552076211057662https://doaj.org/toc/2055-2076Objective A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.Phillip Richter-PechanskiNicolas A GeisChristina KiriakouDominic M SchwabChristoph DieterichSAGE PublishingarticleComputer applications to medicine. Medical informaticsR858-859.7ENDigital Health, Vol 7 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Computer applications to medicine. Medical informatics R858-859.7 |
spellingShingle |
Computer applications to medicine. Medical informatics R858-859.7 Phillip Richter-Pechanski Nicolas A Geis Christina Kiriakou Dominic M Schwab Christoph Dieterich Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
description |
Objective A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages. |
format |
article |
author |
Phillip Richter-Pechanski Nicolas A Geis Christina Kiriakou Dominic M Schwab Christoph Dieterich |
author_facet |
Phillip Richter-Pechanski Nicolas A Geis Christina Kiriakou Dominic M Schwab Christoph Dieterich |
author_sort |
Phillip Richter-Pechanski |
title |
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
title_short |
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
title_full |
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
title_fullStr |
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
title_full_unstemmed |
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models |
title_sort |
automatic extraction of 12 cardiovascular concepts from german discharge letters using pre-trained language models |
publisher |
SAGE Publishing |
publishDate |
2021 |
url |
https://doaj.org/article/d41227b45cd844759a6de54b16972fe7 |
work_keys_str_mv |
AT philliprichterpechanski automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT nicolasageis automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT christinakiriakou automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT dominicmschwab automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels AT christophdieterich automaticextractionof12cardiovascularconceptsfromgermandischargelettersusingpretrainedlanguagemodels |
_version_ |
1718406166865772544 |