Real-time understanding of dynamic human presence is crucial for immersive Augmented Reality (AR), yet challenging on resource-constrained Head-Mounted Displays (HMDs). This paper introduces Rhino-AR, a pipeline for ondevice 3D human pose estimation and dynamic scene integration for commercial AR headsets like the Magic Leap 2. Our system processes RGB and sparse depth data, first detecting 2D keypoints, then robustly lifting them to 3D. Beyond pose estimation, we reconstruct a coarse anatomical model of the human body, tightly coupled with the estimated skeleton. This volumetric proxy for dynamic human geometry is then integrated with the HMD’s static environment mesh by actively removing human-generated artifacts. This integration is crucial, enabling physically plausible interactions between virtual entities and real users, supporting real-time collision detection, and ensuring correct occlusion handling where virtual content respects realworld spatial dynamics. Implemented entirely on the Magic Leap 2, our method achieves low-latency pose updates (under 40 ms) and full 3D lifting (under 60 ms). Comparative evaluation against the RTMW3D-x baseline shows a Procrustes-Aligned Mean Per Joint Position Error below 140 mm, with absolute depth placement validated using an external Azure Kinect sensor. Rhino-AR demonstrates the feasibility of robust, realtime human-aware perception on mobile AR platforms, enabling new classes of interactive, spatially-aware applications without external computation.

Zitation:

Holland, L. V., Kaspers, N., Dengler, N., Stotko, P., Bennewitz, M., & Klein, R. (2025). Towards Rhino-AR: A System for Real-Time 3D Human Pose Estimation and Volumetric Scene Integration on Embedded AR Headsets. In 2025 11th International Conference on Virtual Reality (ICVR) (pp. 135–143). 2025 11th International Conference on Virtual Reality (ICVR). IEEE. https://doi.org/10.1109/icvr66534.2025.11172603