Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble

Abstract Background Standardized coding of plays an important role in radiology reports’ secondary use such as data analytics, data-driven decision support, and personalized medicine. RadLex, a standard radiological lexicon, can reduce subjective variability and improve clarity in radiology reports....

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Yani Chen, Shan Nan, Qi Tian, Hailing Cai, Huilong Duan, Xudong Lu
Formato:	article
Lenguaje:	EN
Publicado:	BMC 2021
Materias:	Automatic coding Hybrid translation Text similarity ensemble Standardized radiology reports Computer applications to medicine. Medical informatics R858-859.7
Acceso en línea:	https://doaj.org/article/28a3f7883cbc4a23bbeec743bcbee2bd
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:28a3f7883cbc4a23bbeec743bcbee2bd
record_format	dspace
spelling	oai:doaj.org-article:28a3f7883cbc4a23bbeec743bcbee2bd2021-11-21T12:28:53ZAutomatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble10.1186/s12911-021-01604-91472-6947https://doaj.org/article/28a3f7883cbc4a23bbeec743bcbee2bd2021-11-01T00:00:00Zhttps://doi.org/10.1186/s12911-021-01604-9https://doaj.org/toc/1472-6947Abstract Background Standardized coding of plays an important role in radiology reports’ secondary use such as data analytics, data-driven decision support, and personalized medicine. RadLex, a standard radiological lexicon, can reduce subjective variability and improve clarity in radiology reports. RadLex coding of radiology reports is widely used in many countries, but translation and localization of RadLex in China are far from being established. Although automatic RadLex coding is a common way for non-standard radiology reports, the high-accuracy cross-language RadLex coding is hardly achieved due to the limitation of up-to-date auto-translation and text similarity algorithms and still requires further research. Methods We present an effective approach that combines a hybrid translation and a Multilayer Perceptron weighting text similarity ensemble algorithm for automatic RadLex coding of Chinese structured radiology reports. Firstly, a hybrid way to integrate Google neural machine translation and dictionary translation helps to optimize the translation of Chinese radiology phrases to English. The dictionary is made up of 21,863 Chinese–English radiological term pairs extracted from several free medical dictionaries. Secondly, four typical text similarity algorithms are introduced, which are Levenshtein distance, Jaccard similarity coefficient, Word2vec Continuous bag-of-words model, and WordNet Wup similarity algorithms. Lastly, the Multilayer Perceptron model has been used to synthesize the contextual, lexical, character and syntactical information of four text similarity algorithms to promote precision, in which four similarity scores of two terms are taken as input and the output presents whether the two terms are synonyms. Results The results show the effectiveness of the approach with an F1-score of 90.15%, a precision of 91.78% and a recall of 88.59%. The hybrid translation algorithm has no negative effect on the final coding, F1-score has increased by 21.44% and 8.12% compared with the GNMT algorithm and dictionary translation. Compared with the single similarity, the result of the MLP weighting similarity algorithm is satisfactory that has a 4.48% increase compared with the best single similarity algorithm, WordNet Wup. Conclusions The paper proposed an innovative automatic cross-language RadLex coding approach to solve the standardization of Chinese structured radiology reports, that can be taken as a reference to automatic cross-language coding.Yani ChenShan NanQi TianHailing CaiHuilong DuanXudong LuBMCarticleAutomatic codingHybrid translationText similarity ensembleStandardized radiology reportsComputer applications to medicine. Medical informaticsR858-859.7ENBMC Medical Informatics and Decision Making, Vol 21, Iss S9, Pp 1-11 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Automatic coding Hybrid translation Text similarity ensemble Standardized radiology reports Computer applications to medicine. Medical informatics R858-859.7
spellingShingle	Automatic coding Hybrid translation Text similarity ensemble Standardized radiology reports Computer applications to medicine. Medical informatics R858-859.7 Yani Chen Shan Nan Qi Tian Hailing Cai Huilong Duan Xudong Lu Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble
description	Abstract Background Standardized coding of plays an important role in radiology reports’ secondary use such as data analytics, data-driven decision support, and personalized medicine. RadLex, a standard radiological lexicon, can reduce subjective variability and improve clarity in radiology reports. RadLex coding of radiology reports is widely used in many countries, but translation and localization of RadLex in China are far from being established. Although automatic RadLex coding is a common way for non-standard radiology reports, the high-accuracy cross-language RadLex coding is hardly achieved due to the limitation of up-to-date auto-translation and text similarity algorithms and still requires further research. Methods We present an effective approach that combines a hybrid translation and a Multilayer Perceptron weighting text similarity ensemble algorithm for automatic RadLex coding of Chinese structured radiology reports. Firstly, a hybrid way to integrate Google neural machine translation and dictionary translation helps to optimize the translation of Chinese radiology phrases to English. The dictionary is made up of 21,863 Chinese–English radiological term pairs extracted from several free medical dictionaries. Secondly, four typical text similarity algorithms are introduced, which are Levenshtein distance, Jaccard similarity coefficient, Word2vec Continuous bag-of-words model, and WordNet Wup similarity algorithms. Lastly, the Multilayer Perceptron model has been used to synthesize the contextual, lexical, character and syntactical information of four text similarity algorithms to promote precision, in which four similarity scores of two terms are taken as input and the output presents whether the two terms are synonyms. Results The results show the effectiveness of the approach with an F1-score of 90.15%, a precision of 91.78% and a recall of 88.59%. The hybrid translation algorithm has no negative effect on the final coding, F1-score has increased by 21.44% and 8.12% compared with the GNMT algorithm and dictionary translation. Compared with the single similarity, the result of the MLP weighting similarity algorithm is satisfactory that has a 4.48% increase compared with the best single similarity algorithm, WordNet Wup. Conclusions The paper proposed an innovative automatic cross-language RadLex coding approach to solve the standardization of Chinese structured radiology reports, that can be taken as a reference to automatic cross-language coding.
format	article
author	Yani Chen Shan Nan Qi Tian Hailing Cai Huilong Duan Xudong Lu
author_facet	Yani Chen Shan Nan Qi Tian Hailing Cai Huilong Duan Xudong Lu
author_sort	Yani Chen
title	Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble
title_short	Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble
title_full	Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble
title_fullStr	Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble
title_full_unstemmed	Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble
title_sort	automatic radlex coding of chinese structured radiology reports based on text similarity ensemble
publisher	BMC
publishDate	2021
url	https://doaj.org/article/28a3f7883cbc4a23bbeec743bcbee2bd
work_keys_str_mv	AT yanichen automaticradlexcodingofchinesestructuredradiologyreportsbasedontextsimilarityensemble AT shannan automaticradlexcodingofchinesestructuredradiologyreportsbasedontextsimilarityensemble AT qitian automaticradlexcodingofchinesestructuredradiologyreportsbasedontextsimilarityensemble AT hailingcai automaticradlexcodingofchinesestructuredradiologyreportsbasedontextsimilarityensemble AT huilongduan automaticradlexcodingofchinesestructuredradiologyreportsbasedontextsimilarityensemble AT xudonglu automaticradlexcodingofchinesestructuredradiologyreportsbasedontextsimilarityensemble
_version_	1718418994986221568

Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble

Ejemplares similares