MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

Abstract Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an add...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Dhruv Sharma, Sanjay Purushotham, Chandan K. Reddy
Formato:	article
Lenguaje:	EN
Publicado:	Nature Portfolio 2021
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/52b07af925ff445990dba24717ca49fe
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:52b07af925ff445990dba24717ca49fe
record_format	dspace
spelling	oai:doaj.org-article:52b07af925ff445990dba24717ca49fe2021-12-02T18:01:41ZMedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain10.1038/s41598-021-98390-12045-2322https://doaj.org/article/52b07af925ff445990dba24717ca49fe2021-10-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-98390-1https://doaj.org/toc/2045-2322Abstract Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision maker. Thus, it becomes crucial to have a reliable visual question answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction—categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrate that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results.Dhruv SharmaSanjay PurushothamChandan K. ReddyNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-18 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Dhruv Sharma Sanjay Purushotham Chandan K. Reddy MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
description	Abstract Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision maker. Thus, it becomes crucial to have a reliable visual question answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction—categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrate that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results.
format	article
author	Dhruv Sharma Sanjay Purushotham Chandan K. Reddy
author_facet	Dhruv Sharma Sanjay Purushotham Chandan K. Reddy
author_sort	Dhruv Sharma
title	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_short	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_full	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_fullStr	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_full_unstemmed	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_sort	medfusenet: an attention-based multimodal deep learning model for visual question answering in the medical domain
publisher	Nature Portfolio
publishDate	2021
url	https://doaj.org/article/52b07af925ff445990dba24717ca49fe
work_keys_str_mv	AT dhruvsharma medfusenetanattentionbasedmultimodaldeeplearningmodelforvisualquestionansweringinthemedicaldomain AT sanjaypurushotham medfusenetanattentionbasedmultimodaldeeplearningmodelforvisualquestionansweringinthemedicaldomain AT chandankreddy medfusenetanattentionbasedmultimodaldeeplearningmodelforvisualquestionansweringinthemedicaldomain
_version_	1718378926602977280

MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

Ejemplares similares