Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement

This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appe...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Chengming Ma, Qian Liu, Yaqi Dang
Format:	article
Langue:	EN
Publié:	Frontiers Media S.A. 2021
Sujets:	intelligent augmentation multimodality human ART pose recognition interaction Psychology BF1-990
Accès en ligne:	https://doaj.org/article/c3dfa052b0a5410db73acd2687df9639
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:c3dfa052b0a5410db73acd2687df9639
record_format	dspace
spelling	oai:doaj.org-article:c3dfa052b0a5410db73acd2687df96392021-11-08T04:48:51ZMultimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement1664-107810.3389/fpsyg.2021.769509https://doaj.org/article/c3dfa052b0a5410db73acd2687df96392021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fpsyg.2021.769509/fullhttps://doaj.org/toc/1664-1078This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.Chengming MaQian LiuYaqi DangFrontiers Media S.A.articleintelligent augmentationmultimodalityhuman ARTpose recognitioninteractionPsychologyBF1-990ENFrontiers in Psychology, Vol 12 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	intelligent augmentation multimodality human ART pose recognition interaction Psychology BF1-990
spellingShingle	intelligent augmentation multimodality human ART pose recognition interaction Psychology BF1-990 Chengming Ma Qian Liu Yaqi Dang Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
description	This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.
format	article
author	Chengming Ma Qian Liu Yaqi Dang
author_facet	Chengming Ma Qian Liu Yaqi Dang
author_sort	Chengming Ma
title	Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_short	Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_full	Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_fullStr	Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_full_unstemmed	Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_sort	multimodal art pose recognition and interaction with human intelligence enhancement
publisher	Frontiers Media S.A.
publishDate	2021
url	https://doaj.org/article/c3dfa052b0a5410db73acd2687df9639
work_keys_str_mv	AT chengmingma multimodalartposerecognitionandinteractionwithhumanintelligenceenhancement AT qianliu multimodalartposerecognitionandinteractionwithhumanintelligenceenhancement AT yaqidang multimodalartposerecognitionandinteractionwithhumanintelligenceenhancement
_version_	1718443038411325440

Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement

Documents similaires