Multi‐dimensional weighted cross‐attention network in crowded scenes

Abstract Human detection in crowded scenes is one of the research components of crowd safety problem analysis, such as emergency warning and security monitoring platforms. Although the existing anchor‐free methods have fast inference speed, they are not suitable for object detection in crowded scene...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Yefan Xie, Jiangbin Zheng, Xuan Hou, Irfan Raza Naqvi, Yue Xi, Nailiang Kuang
Formato:	article
Lenguaje:	EN
Publicado:	Wiley 2021
Materias:	Photography TR1-1050 Computer software QA76.75-76.765
Acceso en línea:	https://doaj.org/article/7dad8689e7c14b8c9b305656b35d5263
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:7dad8689e7c14b8c9b305656b35d5263
record_format	dspace
spelling	oai:doaj.org-article:7dad8689e7c14b8c9b305656b35d52632021-11-29T03:38:16ZMulti‐dimensional weighted cross‐attention network in crowded scenes1751-96671751-965910.1049/ipr2.12298https://doaj.org/article/7dad8689e7c14b8c9b305656b35d52632021-12-01T00:00:00Zhttps://doi.org/10.1049/ipr2.12298https://doaj.org/toc/1751-9659https://doaj.org/toc/1751-9667Abstract Human detection in crowded scenes is one of the research components of crowd safety problem analysis, such as emergency warning and security monitoring platforms. Although the existing anchor‐free methods have fast inference speed, they are not suitable for object detection in crowded scenes due to the model's inability to predict the well‐fined object detection bounding boxes. This work proposes an end‐to‐end anchor‐free network, Multi‐dimensional Weighted Cross‐Attention Network (MANet), which can perform real‐time human detection in crowded scenes. Specifically, the Double‐flow Weighted Feature Cascade Module (DW‐FCM) is used in the extractor to highlight the contribution of features at different levels. The Triplet Cross Attention Module (TCAM) is used in the detector head to enhance the association dependence of multi‐dimension features, further strengthening human boundary features' discrimination ability at a fine‐grained level. Moreover, the strategy of Adaptively Opposite Thrust Mapping (AOTM) ground‐truth annotation is proposed to achieve bias correction of erroneous mappings and reduce the iterations of useless learning of the network. These strategies effectively alleviate the defect that the existing anchor‐free network cannot correctly distinguish and locate the individual human in crowded scenes. Compared with the anchor‐based detection method, there is no need to set anchor parameters manually, and the detection speed can satisfy the real‐time application. Finally, through extensive comparative experiments on CrowdHuman and WIDER FACE datasets, the results demonstrate that the improved strategy achieves the state‐of‐the‐art result in the anchor‐free methods.Yefan XieJiangbin ZhengXuan HouIrfan Raza NaqviYue XiNailiang KuangWileyarticlePhotographyTR1-1050Computer softwareQA76.75-76.765ENIET Image Processing, Vol 15, Iss 14, Pp 3585-3598 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Photography TR1-1050 Computer software QA76.75-76.765
spellingShingle	Photography TR1-1050 Computer software QA76.75-76.765 Yefan Xie Jiangbin Zheng Xuan Hou Irfan Raza Naqvi Yue Xi Nailiang Kuang Multi‐dimensional weighted cross‐attention network in crowded scenes
description	Abstract Human detection in crowded scenes is one of the research components of crowd safety problem analysis, such as emergency warning and security monitoring platforms. Although the existing anchor‐free methods have fast inference speed, they are not suitable for object detection in crowded scenes due to the model's inability to predict the well‐fined object detection bounding boxes. This work proposes an end‐to‐end anchor‐free network, Multi‐dimensional Weighted Cross‐Attention Network (MANet), which can perform real‐time human detection in crowded scenes. Specifically, the Double‐flow Weighted Feature Cascade Module (DW‐FCM) is used in the extractor to highlight the contribution of features at different levels. The Triplet Cross Attention Module (TCAM) is used in the detector head to enhance the association dependence of multi‐dimension features, further strengthening human boundary features' discrimination ability at a fine‐grained level. Moreover, the strategy of Adaptively Opposite Thrust Mapping (AOTM) ground‐truth annotation is proposed to achieve bias correction of erroneous mappings and reduce the iterations of useless learning of the network. These strategies effectively alleviate the defect that the existing anchor‐free network cannot correctly distinguish and locate the individual human in crowded scenes. Compared with the anchor‐based detection method, there is no need to set anchor parameters manually, and the detection speed can satisfy the real‐time application. Finally, through extensive comparative experiments on CrowdHuman and WIDER FACE datasets, the results demonstrate that the improved strategy achieves the state‐of‐the‐art result in the anchor‐free methods.
format	article
author	Yefan Xie Jiangbin Zheng Xuan Hou Irfan Raza Naqvi Yue Xi Nailiang Kuang
author_facet	Yefan Xie Jiangbin Zheng Xuan Hou Irfan Raza Naqvi Yue Xi Nailiang Kuang
author_sort	Yefan Xie
title	Multi‐dimensional weighted cross‐attention network in crowded scenes
title_short	Multi‐dimensional weighted cross‐attention network in crowded scenes
title_full	Multi‐dimensional weighted cross‐attention network in crowded scenes
title_fullStr	Multi‐dimensional weighted cross‐attention network in crowded scenes
title_full_unstemmed	Multi‐dimensional weighted cross‐attention network in crowded scenes
title_sort	multi‐dimensional weighted cross‐attention network in crowded scenes
publisher	Wiley
publishDate	2021
url	https://doaj.org/article/7dad8689e7c14b8c9b305656b35d5263
work_keys_str_mv	AT yefanxie multidimensionalweightedcrossattentionnetworkincrowdedscenes AT jiangbinzheng multidimensionalweightedcrossattentionnetworkincrowdedscenes AT xuanhou multidimensionalweightedcrossattentionnetworkincrowdedscenes AT irfanrazanaqvi multidimensionalweightedcrossattentionnetworkincrowdedscenes AT yuexi multidimensionalweightedcrossattentionnetworkincrowdedscenes AT nailiangkuang multidimensionalweightedcrossattentionnetworkincrowdedscenes
_version_	1718407653892292608

Multi‐dimensional weighted cross‐attention network in crowded scenes

Ejemplares similares