Maximilian Waidhas - WestAI

A Fully Zero-Shot Approach to Obtaining Specialized and Compact Audio Tagging Models

Zero-shot classifiers based on Contrastive Language-Audio Pretraining (CLAP) models enable classification of given audio into classes defined at test time using text. These models are costly to run with respect to computation and memory requirements. In this work, we...

Towards Rhino-AR: A System for Real-Time 3D Human Pose Estimation and Volumetric Scene Integration on Embedded AR Headsets

Real-time understanding of dynamic human presence is crucial for immersive Augmented Reality (AR), yet challenging on resource-constrained Head-Mounted Displays (HMDs). This paper introduces Rhino-AR, a pipeline for ondevice 3D human pose estimation and dynamic scene...

SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video

The reconstruction of three-dimensional dynamic scenes is a well-established yet challenging task within the domain of computer vision. In this paper, we propose a novel approach that combines the domains of 3D geometry reconstruction and appearance estimation for...

Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers

Modern methods for fine-tuning a Vision Transformer (ViT) like Low-Rank Adaptation (LoRA) and its variants demonstrate impressive performance. However, these methods ignore the high-dimensional nature of Multi-Head Attention (MHA) weight tensors. To address this...

Improving the Quality of Unstructured Cancer Data Using Large Language Models: A German Oncological Case Study

With cancer being a leading cause of death globally, epidemiological and clinical cancer registration is paramount for enhancing oncological care and facilitating scientific research. However, the heterogeneous landscape of medical data presents significant challenges...

« Ältere Einträge

Nächste Einträge »