HMD2: Environment-Aware Motion Generation from Single Egocentric Head-Mounted Device

/ Authors

Vladimir Guzov, Yifeng Jiang, Fangzhou Hong, Gerard Pons-Moll, Richard A. Newcombe, C. K. Liu, Yuting Ye, Lingni Ma

/ Abstract

This paper investigates the generation of realistic full-body human motion using a single head-mounted device with an outward-facing color camera and the ability to perform visual SLAM. To address the ambiguity of this setup, we present HMD2, a novel system that balances motion reconstruction and generation. From a reconstruction stand-point, it aims to maximally utilize the camera streams to produce both analytical and learned features, including head motion, SLAM point cloud, and image embeddings. On the generative front, HMD2 employs a multi-modal conditional motion diffusion model with a Transformer back-bone to maintain temporal coherence of generated motions, and utilizes autoregressive inpainting to facilitate online motion inference with minimal latency (0.17 seconds). We show that our system provides an effective and robust so-lution that scales to a diverse dataset of over 200 hours of motion in complex indoor and outdoor environments.

Journal: 2025 International Conference on 3D Vision (3DV)

DOI: 10.1109/3DV66043.2025.00132