Obtaining strong reproducible foundation language-audio models require open datasets of sufficient scale and quality. To pre-train contrastive language-audio model we compose large-scale sound effects dataset with detailed text descriptions for each sample. Generating...
In the era of big data and artificial intelligence, distributed machine learning has emerged as a promising solution to address privacy and security concerns while fostering collaboration between multiple parties. However, with the data increased in terms of...
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a...
Accurately perceiving and tracking instances over time is essential for the decision-making processes of autonomous agents interacting safely in dynamic environments. With this intention, we propose Mask4Former for the challenging task of 4D panoptic segmentation of...
In the domain of graph neural networks (GNNs), pooling operators are fundamental to reduce the size of the graph by simplifying graph structures and vertex features. Recent advances have shown that well-designed pooling operators, coupled with message-passing...
Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their...