Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks

Abstract Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enor...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Arjan Sammani, Ayoub Bagheri, Peter G. M. van der Heijden, Anneline S. J. M. te Riele, Annette F. Baas, C. A. J. Oosters, Daniel Oberski, Folkert W. Asselbergs
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
Acceso en línea:https://doaj.org/article/0f7e2a6ca61e4c53a1d2909562865f85
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:0f7e2a6ca61e4c53a1d2909562865f85
record_format dspace
spelling oai:doaj.org-article:0f7e2a6ca61e4c53a1d2909562865f852021-12-02T15:54:14ZAutomatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks10.1038/s41746-021-00404-92398-6352https://doaj.org/article/0f7e2a6ca61e4c53a1d2909562865f852021-02-01T00:00:00Zhttps://doi.org/10.1038/s41746-021-00404-9https://doaj.org/toc/2398-6352Abstract Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.Arjan SammaniAyoub BagheriPeter G. M. van der HeijdenAnneline S. J. M. te RieleAnnette F. BaasC. A. J. OostersDaniel OberskiFolkert W. AsselbergsNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 4, Iss 1, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Computer applications to medicine. Medical informatics
R858-859.7
spellingShingle Computer applications to medicine. Medical informatics
R858-859.7
Arjan Sammani
Ayoub Bagheri
Peter G. M. van der Heijden
Anneline S. J. M. te Riele
Annette F. Baas
C. A. J. Oosters
Daniel Oberski
Folkert W. Asselbergs
Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
description Abstract Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.
format article
author Arjan Sammani
Ayoub Bagheri
Peter G. M. van der Heijden
Anneline S. J. M. te Riele
Annette F. Baas
C. A. J. Oosters
Daniel Oberski
Folkert W. Asselbergs
author_facet Arjan Sammani
Ayoub Bagheri
Peter G. M. van der Heijden
Anneline S. J. M. te Riele
Annette F. Baas
C. A. J. Oosters
Daniel Oberski
Folkert W. Asselbergs
author_sort Arjan Sammani
title Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
title_short Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
title_full Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
title_fullStr Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
title_full_unstemmed Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
title_sort automatic multilabel detection of icd10 codes in dutch cardiology discharge letters using neural networks
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/0f7e2a6ca61e4c53a1d2909562865f85
work_keys_str_mv AT arjansammani automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT ayoubbagheri automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT petergmvanderheijden automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT annelinesjmteriele automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT annettefbaas automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT cajoosters automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT danieloberski automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
AT folkertwasselbergs automaticmultilabeldetectionoficd10codesindutchcardiologydischargelettersusingneuralnetworks
_version_ 1718385433234112512