The fast wavelet transform is an important workhorse in signal processing. Wavelets are local in the spatial- or temporal- and the frequency-domain. This property enables frequency domain analysis while preserving some spatiotemporal information. Until recently,...
The correct detection of article layout in historical newspaper pages remains challenging but is important for Natural Language Processing ( NLP) and machine learning applications in the field of digital history. Digital newspaper portals typically provide Optical...
Modeling long-term context in videos is crucial for many fine-grained tasks including temporal action segmentation. An interesting question that is still open is how much long-term temporal context is needed for optimal performance. While transformers can model the...
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction. Unlike short-term anticipation, predicting more actions into the future imposes a real challenge with the increasing uncertainty in...
In computer vision, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, its quadratic complexity limits its applicability to tasks that benefit from high-resolution input. In this work, we...
We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind’s Flamingo models. On seven vision-language datasets, OpenFlamingo...