Monocular Human Depth Estimation Via Pose Estimation

We propose a novel monocular depth estimator, which improves the prediction accuracy on human regions by utilizing pose information. The proposed algorithm consists of two networks — PoseNet and DepthNet — to estimate keypoint heatmaps and a depth map, respectively. We incorpor...

Description complète

Enregistré dans:
Détails bibliographiques
Auteurs principaux: Jinyoung Jun, Jae-Han Lee, Chul Lee, Chang-Su Kim
Format: article
Langue:EN
Publié: IEEE 2021
Sujets:
Accès en ligne:https://doaj.org/article/362374746b434f51bfc40a80c1a6f80d
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Description
Résumé:We propose a novel monocular depth estimator, which improves the prediction accuracy on human regions by utilizing pose information. The proposed algorithm consists of two networks &#x2014; PoseNet and DepthNet &#x2014; to estimate keypoint heatmaps and a depth map, respectively. We incorporate the pose information from PoseNet to improve the depth estimation performance of DepthNet. Specifically, we develop the feature blending block, which fuses the features from PoseNet and DepthNet and feeds them into the next layer of DepthNet, to make the networks learn to predict the depths of human regions more accurately. Furthermore, we develop a novel joint training scheme using partially labeled datasets, which balances multiple loss functions effectively by adjusting weights. Experimental results demonstrate that the proposed algorithm can improve depth estimation performance significantly, especially around human regions. For example, the proposed algorithm improves the depth estimation performance on the human regions of ResNet-50 by 2.8&#x0025; and 7.0&#x0025; in terms of <inline-formula> <tex-math notation="LaTeX">$\delta _{1}$ </tex-math></inline-formula> and RMSE, respectively, on the proposed HD &#x002B; P dataset.