Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation

The steadily increasing number of medical images places a tremendous burden on doctors, who toned to read and write reports. If an image captioning model could generate drafts of the reports from the corresponding images, the workload of doctors would be reduced, thereby saving time and expenses. Th...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Hyeryun Park, Kyungmo Kim, Seongkeun Park, Jinwook Choi
Formato:	article
Lenguaje:	EN
Publicado:	IEEE 2021
Materias:	Chest x-ray deep learning feature differences medical image captioning Electrical engineering. Electronics. Nuclear engineering TK1-9971
Acceso en línea:	https://doaj.org/article/68fe8108a9294323ba0ffd40f0ef10a5
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:68fe8108a9294323ba0ffd40f0ef10a5
record_format	dspace
spelling	oai:doaj.org-article:68fe8108a9294323ba0ffd40f0ef10a52021-11-18T00:06:36ZMedical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation2169-353610.1109/ACCESS.2021.3124564https://doaj.org/article/68fe8108a9294323ba0ffd40f0ef10a52021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9597615/https://doaj.org/toc/2169-3536The steadily increasing number of medical images places a tremendous burden on doctors, who toned to read and write reports. If an image captioning model could generate drafts of the reports from the corresponding images, the workload of doctors would be reduced, thereby saving time and expenses. The aim of this study was to develop a chest x-ray image captioning model that considers the differences between patient images and normal images, and uses hierarchical long short-term memory (LSTM) or a transformer as a decoder to generate reports. We investigated which feature representation method was the most appropriate for capturing the differences. The feature representations differed in terms of whether global average pooling was used for the visual feature vectors and how the feature difference vectors were generated. Experiments were conducted on two datasets using the proposed models and recent captioning models (X-LAN and X-Transformer). BLEU, METEOR, ROUGE-L, and CIDEr were used as evaluation metrics. The best model for most metric scores was the multi-difference non-average-pooling transformer model, which uses the transformer decoder, does not use global average pooling for the visual feature vectors, and applies the element-wise product. The transformer decoder was found to be more suitable than hierarchical LSTM. Furthermore, for models that do not condense features with global average pooling, the element-wise product was observed to be more effective than subtraction in expressing the feature differences.Hyeryun ParkKyungmo KimSeongkeun ParkJinwook ChoiIEEEarticleChest x-raydeep learningfeature differencesmedical image captioningElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 150560-150568 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Chest x-ray deep learning feature differences medical image captioning Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	Chest x-ray deep learning feature differences medical image captioning Electrical engineering. Electronics. Nuclear engineering TK1-9971 Hyeryun Park Kyungmo Kim Seongkeun Park Jinwook Choi Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
description	The steadily increasing number of medical images places a tremendous burden on doctors, who toned to read and write reports. If an image captioning model could generate drafts of the reports from the corresponding images, the workload of doctors would be reduced, thereby saving time and expenses. The aim of this study was to develop a chest x-ray image captioning model that considers the differences between patient images and normal images, and uses hierarchical long short-term memory (LSTM) or a transformer as a decoder to generate reports. We investigated which feature representation method was the most appropriate for capturing the differences. The feature representations differed in terms of whether global average pooling was used for the visual feature vectors and how the feature difference vectors were generated. Experiments were conducted on two datasets using the proposed models and recent captioning models (X-LAN and X-Transformer). BLEU, METEOR, ROUGE-L, and CIDEr were used as evaluation metrics. The best model for most metric scores was the multi-difference non-average-pooling transformer model, which uses the transformer decoder, does not use global average pooling for the visual feature vectors, and applies the element-wise product. The transformer decoder was found to be more suitable than hierarchical LSTM. Furthermore, for models that do not condense features with global average pooling, the element-wise product was observed to be more effective than subtraction in expressing the feature differences.
format	article
author	Hyeryun Park Kyungmo Kim Seongkeun Park Jinwook Choi
author_facet	Hyeryun Park Kyungmo Kim Seongkeun Park Jinwook Choi
author_sort	Hyeryun Park
title	Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_short	Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_full	Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_fullStr	Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_full_unstemmed	Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_sort	medical image captioning model to convey more details: methodological comparison of feature difference generation
publisher	IEEE
publishDate	2021
url	https://doaj.org/article/68fe8108a9294323ba0ffd40f0ef10a5
work_keys_str_mv	AT hyeryunpark medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration AT kyungmokim medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration AT seongkeunpark medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration AT jinwookchoi medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration
_version_	1718425228778930176

Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation

Ejemplares similares