Describir: Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input