Adversarial Learning with Bidirectional Attention for Visual Question Answering

In this paper, we provide external image features and use the internal attention mechanism to solve the VQA problem given a dataset of textual questions and related images. Most previous models for VQA use a pair of images and questions as input. In addition, the model adopts a question-oriented att...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qifeng Li, Xinyi Tang, Yi Jian
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/0f304144b3574f6799625865e55ac883
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:0f304144b3574f6799625865e55ac883
record_format dspace
spelling oai:doaj.org-article:0f304144b3574f6799625865e55ac8832021-11-11T19:09:27ZAdversarial Learning with Bidirectional Attention for Visual Question Answering10.3390/s212171641424-8220https://doaj.org/article/0f304144b3574f6799625865e55ac8832021-10-01T00:00:00Zhttps://www.mdpi.com/1424-8220/21/21/7164https://doaj.org/toc/1424-8220In this paper, we provide external image features and use the internal attention mechanism to solve the VQA problem given a dataset of textual questions and related images. Most previous models for VQA use a pair of images and questions as input. In addition, the model adopts a question-oriented attention mechanism to extract the features of the entire image and then perform feature fusion. However, the shortcoming of these models is that they cannot effectively eliminate the irrelevant features of the image. In addition, the problem-oriented attention mechanism lacks in the mining of image features, which will bring in redundant image features. In this paper, we propose a VQA model based on adversarial learning and bidirectional attention. We exploit external image features that are not related to the question to form an adversarial mechanism to boost the accuracy of the model. Target detection is performed on the image—that is, the image-oriented attention mechanism. The bidirectional attention mechanism is conducive to promoting model attention and eliminating interference. Experimental results are evaluated on benchmark datasets, and our model performs better than other models based on attention methods. In addition, the qualitative results show the attention maps on the images and leads to predicting correct answers.Qifeng LiXinyi TangYi JianMDPI AGarticlebidirectional attentionadversarial learningvisual question answeringattention visualizationfeature fusionfeature selectionChemical technologyTP1-1185ENSensors, Vol 21, Iss 7164, p 7164 (2021)
institution DOAJ
collection DOAJ
language EN
topic bidirectional attention
adversarial learning
visual question answering
attention visualization
feature fusion
feature selection
Chemical technology
TP1-1185
spellingShingle bidirectional attention
adversarial learning
visual question answering
attention visualization
feature fusion
feature selection
Chemical technology
TP1-1185
Qifeng Li
Xinyi Tang
Yi Jian
Adversarial Learning with Bidirectional Attention for Visual Question Answering
description In this paper, we provide external image features and use the internal attention mechanism to solve the VQA problem given a dataset of textual questions and related images. Most previous models for VQA use a pair of images and questions as input. In addition, the model adopts a question-oriented attention mechanism to extract the features of the entire image and then perform feature fusion. However, the shortcoming of these models is that they cannot effectively eliminate the irrelevant features of the image. In addition, the problem-oriented attention mechanism lacks in the mining of image features, which will bring in redundant image features. In this paper, we propose a VQA model based on adversarial learning and bidirectional attention. We exploit external image features that are not related to the question to form an adversarial mechanism to boost the accuracy of the model. Target detection is performed on the image—that is, the image-oriented attention mechanism. The bidirectional attention mechanism is conducive to promoting model attention and eliminating interference. Experimental results are evaluated on benchmark datasets, and our model performs better than other models based on attention methods. In addition, the qualitative results show the attention maps on the images and leads to predicting correct answers.
format article
author Qifeng Li
Xinyi Tang
Yi Jian
author_facet Qifeng Li
Xinyi Tang
Yi Jian
author_sort Qifeng Li
title Adversarial Learning with Bidirectional Attention for Visual Question Answering
title_short Adversarial Learning with Bidirectional Attention for Visual Question Answering
title_full Adversarial Learning with Bidirectional Attention for Visual Question Answering
title_fullStr Adversarial Learning with Bidirectional Attention for Visual Question Answering
title_full_unstemmed Adversarial Learning with Bidirectional Attention for Visual Question Answering
title_sort adversarial learning with bidirectional attention for visual question answering
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/0f304144b3574f6799625865e55ac883
work_keys_str_mv AT qifengli adversariallearningwithbidirectionalattentionforvisualquestionanswering
AT xinyitang adversariallearningwithbidirectionalattentionforvisualquestionanswering
AT yijian adversariallearningwithbidirectionalattentionforvisualquestionanswering
_version_ 1718431568510320640