Object detection based on an adaptive attention mechanism

Abstract Object detection is an important component of computer vision. Most of the recent successful object detection methods are based on convolutional neural networks (CNNs). To improve the performance of these networks, researchers have designed many different architectures. They found that the...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wei Li, Kai Liu, Lizhe Zhang, Fei Cheng
Format:	article
Langue:	EN
Publié:	Nature Portfolio 2020
Sujets:	Medicine R Science Q
Accès en ligne:	https://doaj.org/article/d6996ace62984c7c83a6f556fb40dc2e
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:d6996ace62984c7c83a6f556fb40dc2e
record_format	dspace
spelling	oai:doaj.org-article:d6996ace62984c7c83a6f556fb40dc2e2021-12-02T15:39:58ZObject detection based on an adaptive attention mechanism10.1038/s41598-020-67529-x2045-2322https://doaj.org/article/d6996ace62984c7c83a6f556fb40dc2e2020-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-020-67529-xhttps://doaj.org/toc/2045-2322Abstract Object detection is an important component of computer vision. Most of the recent successful object detection methods are based on convolutional neural networks (CNNs). To improve the performance of these networks, researchers have designed many different architectures. They found that the CNN performance benefits from carefully increasing the depth and width of their structures with respect to the spatial dimension. Some researchers have exploited the cardinality dimension. Others have found that skip and dense connections were also of benefit to performance. Recently, attention mechanisms on the channel dimension have gained popularity with researchers. Global average pooling is used in SENet to generate the input feature vector of the channel-wise attention unit. In this work, we argue that channel-wise attention can benefit from both global average pooling and global max pooling. We designed three novel attention units, namely, an adaptive channel-wise attention unit, an adaptive spatial-wise attention unit and an adaptive domain attention unit, to improve the performance of a CNN. Instead of concatenating the output of the two attention vectors generated by the two channel-wise attention sub-units, we weight the two attention vectors based on the output data of the two channel-wise attention sub-units. We integrated the proposed mechanism with the YOLOv3 and MobileNetv2 framework and tested the proposed network on the KITTI and Pascal VOC datasets. The experimental results show that YOLOv3 with the proposed attention mechanism outperforms the original YOLOv3 by mAP values of 2.9 and 1.2% on the KITTI and Pascal VOC datasets, respectively. MobileNetv2 with the proposed attention mechanism outperforms the original MobileNetv2 by a mAP value of 1.7% on the Pascal VOC dataset.Wei LiKai LiuLizhe ZhangFei ChengNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 10, Iss 1, Pp 1-13 (2020)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Wei Li Kai Liu Lizhe Zhang Fei Cheng Object detection based on an adaptive attention mechanism
description	Abstract Object detection is an important component of computer vision. Most of the recent successful object detection methods are based on convolutional neural networks (CNNs). To improve the performance of these networks, researchers have designed many different architectures. They found that the CNN performance benefits from carefully increasing the depth and width of their structures with respect to the spatial dimension. Some researchers have exploited the cardinality dimension. Others have found that skip and dense connections were also of benefit to performance. Recently, attention mechanisms on the channel dimension have gained popularity with researchers. Global average pooling is used in SENet to generate the input feature vector of the channel-wise attention unit. In this work, we argue that channel-wise attention can benefit from both global average pooling and global max pooling. We designed three novel attention units, namely, an adaptive channel-wise attention unit, an adaptive spatial-wise attention unit and an adaptive domain attention unit, to improve the performance of a CNN. Instead of concatenating the output of the two attention vectors generated by the two channel-wise attention sub-units, we weight the two attention vectors based on the output data of the two channel-wise attention sub-units. We integrated the proposed mechanism with the YOLOv3 and MobileNetv2 framework and tested the proposed network on the KITTI and Pascal VOC datasets. The experimental results show that YOLOv3 with the proposed attention mechanism outperforms the original YOLOv3 by mAP values of 2.9 and 1.2% on the KITTI and Pascal VOC datasets, respectively. MobileNetv2 with the proposed attention mechanism outperforms the original MobileNetv2 by a mAP value of 1.7% on the Pascal VOC dataset.
format	article
author	Wei Li Kai Liu Lizhe Zhang Fei Cheng
author_facet	Wei Li Kai Liu Lizhe Zhang Fei Cheng
author_sort	Wei Li
title	Object detection based on an adaptive attention mechanism
title_short	Object detection based on an adaptive attention mechanism
title_full	Object detection based on an adaptive attention mechanism
title_fullStr	Object detection based on an adaptive attention mechanism
title_full_unstemmed	Object detection based on an adaptive attention mechanism
title_sort	object detection based on an adaptive attention mechanism
publisher	Nature Portfolio
publishDate	2020
url	https://doaj.org/article/d6996ace62984c7c83a6f556fb40dc2e
work_keys_str_mv	AT weili objectdetectionbasedonanadaptiveattentionmechanism AT kailiu objectdetectionbasedonanadaptiveattentionmechanism AT lizhezhang objectdetectionbasedonanadaptiveattentionmechanism AT feicheng objectdetectionbasedonanadaptiveattentionmechanism
_version_	1718385903008743424

Object detection based on an adaptive attention mechanism

Documents similaires