Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to ϕ-SfT, a state-of-the-art physics-based SfT approach.

Citation:

D. Stotko, N. Wandel, and R. Klein, “Physics-gided Shape-from-Template: Monoclar Video Perception throgh Neral Srrogate Models,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11895-11904, 2024, doi: 10.1109/CVPR52733.2024.01130.

More Information:

Open source: https://doi.org/10.1109/CVPR52733.2024.01130