Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation

The steadily increasing number of medical images places a tremendous burden on doctors, who toned to read and write reports. If an image captioning model could generate drafts of the reports from the corresponding images, the workload of doctors would be reduced, thereby saving time and expenses. Th...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Hyeryun Park, Kyungmo Kim, Seongkeun Park, Jinwook Choi
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/68fe8108a9294323ba0ffd40f0ef10a5
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:68fe8108a9294323ba0ffd40f0ef10a5
record_format dspace
spelling oai:doaj.org-article:68fe8108a9294323ba0ffd40f0ef10a52021-11-18T00:06:36ZMedical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation2169-353610.1109/ACCESS.2021.3124564https://doaj.org/article/68fe8108a9294323ba0ffd40f0ef10a52021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9597615/https://doaj.org/toc/2169-3536The steadily increasing number of medical images places a tremendous burden on doctors, who toned to read and write reports. If an image captioning model could generate drafts of the reports from the corresponding images, the workload of doctors would be reduced, thereby saving time and expenses. The aim of this study was to develop a chest x-ray image captioning model that considers the differences between patient images and normal images, and uses hierarchical long short-term memory (LSTM) or a transformer as a decoder to generate reports. We investigated which feature representation method was the most appropriate for capturing the differences. The feature representations differed in terms of whether global average pooling was used for the visual feature vectors and how the feature difference vectors were generated. Experiments were conducted on two datasets using the proposed models and recent captioning models (X-LAN and X-Transformer). BLEU, METEOR, ROUGE-L, and CIDEr were used as evaluation metrics. The best model for most metric scores was the multi-difference non-average-pooling transformer model, which uses the transformer decoder, does not use global average pooling for the visual feature vectors, and applies the element-wise product. The transformer decoder was found to be more suitable than hierarchical LSTM. Furthermore, for models that do not condense features with global average pooling, the element-wise product was observed to be more effective than subtraction in expressing the feature differences.Hyeryun ParkKyungmo KimSeongkeun ParkJinwook ChoiIEEEarticleChest x-raydeep learningfeature differencesmedical image captioningElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 150560-150568 (2021)
institution DOAJ
collection DOAJ
language EN
topic Chest x-ray
deep learning
feature differences
medical image captioning
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Chest x-ray
deep learning
feature differences
medical image captioning
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Hyeryun Park
Kyungmo Kim
Seongkeun Park
Jinwook Choi
Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
description The steadily increasing number of medical images places a tremendous burden on doctors, who toned to read and write reports. If an image captioning model could generate drafts of the reports from the corresponding images, the workload of doctors would be reduced, thereby saving time and expenses. The aim of this study was to develop a chest x-ray image captioning model that considers the differences between patient images and normal images, and uses hierarchical long short-term memory (LSTM) or a transformer as a decoder to generate reports. We investigated which feature representation method was the most appropriate for capturing the differences. The feature representations differed in terms of whether global average pooling was used for the visual feature vectors and how the feature difference vectors were generated. Experiments were conducted on two datasets using the proposed models and recent captioning models (X-LAN and X-Transformer). BLEU, METEOR, ROUGE-L, and CIDEr were used as evaluation metrics. The best model for most metric scores was the multi-difference non-average-pooling transformer model, which uses the transformer decoder, does not use global average pooling for the visual feature vectors, and applies the element-wise product. The transformer decoder was found to be more suitable than hierarchical LSTM. Furthermore, for models that do not condense features with global average pooling, the element-wise product was observed to be more effective than subtraction in expressing the feature differences.
format article
author Hyeryun Park
Kyungmo Kim
Seongkeun Park
Jinwook Choi
author_facet Hyeryun Park
Kyungmo Kim
Seongkeun Park
Jinwook Choi
author_sort Hyeryun Park
title Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_short Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_full Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_fullStr Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_full_unstemmed Medical Image Captioning Model to Convey More Details: Methodological Comparison of Feature Difference Generation
title_sort medical image captioning model to convey more details: methodological comparison of feature difference generation
publisher IEEE
publishDate 2021
url https://doaj.org/article/68fe8108a9294323ba0ffd40f0ef10a5
work_keys_str_mv AT hyeryunpark medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration
AT kyungmokim medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration
AT seongkeunpark medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration
AT jinwookchoi medicalimagecaptioningmodeltoconveymoredetailsmethodologicalcomparisonoffeaturedifferencegeneration
_version_ 1718425228778930176