1Tsinghua University, 2Hisense Inc.
Graph convolutional network (GCN) has achieved great success in single hand reconstruction task, while interacting two-hand reconstruction by GCN remains unexplored. In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image. To solve occlusion and interaction challenges of two-hand reconstruction, we introduce two novel attention based modules in each upsampling step of original GCN. The first module is the pyramid image feature attention (PIFA) module, which utilizes multiresolution features to implicitly obtain vertex-to-image alignment. The second module is the cross hand attention (CHA) module that encodes the coherence of interacting hands by building dense cross-attention between two hand vertices. As a result, our model outperforms all existing two-hand reconstruction methods by a large margin on InterHand2.6M benchmark. Moreover, ablation studies verify the effectiveness of both PIFA and CHA modules for improving the reconstruction accuracy. Results on in-the-wild images further demonstrate the generalization ability of our network.
Fig 1. Our network structure. Given an RGB image as input, our network first distills a global feature vector, a sequence of pyramid image features along with other auxiliary predictions (2D pose, segmentation, dense mapping encoding). Then our model directly regresses the 3D coordinates of two hands surface vertices after three steps of IntagHand blocks and upsampling. Each IntagHand block contains a GCN module, a pyramid image feature attention (PIFA) module and a cross-hand attention (CHA) module.
Fig 2. Qualitative results of our method on InterHand2.6M test dataset. Our method works well under various kinds of interactions. Note that, our method could even produce correct finger level interactions without explicit collision detection.
Fig 3. Qualitative results of our method on in-the-wild images. Our method performs well on our real-life data captured by a common USB camera.
Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, Yebin Liu. "Interacting Attention Graph for Single Image Two-Hand Reconstruction". In Proceedings of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
@inproceedings{Li2022intaghand,
title={Interacting Attention Graph for Single Image Two-Hand Reconstruction},
author={Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu and Yebin Liu},
booktitle={IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)},
month=jun,
year={2022},
}