Abstract: Abnormal event detection in video is a challenging vision problem. Most existing approaches formulate abnormal event detection as an outlier detection task, due to the scarcity of anomalous ...
Abstract: Image retrieval using spoken language cues has emerged as a promising direction in multimodal perception, yet leveraging speech in multi-speaker scenarios remains challenging. We propose a ...
OpenStereo is a flexible and extensible project for stereo matching. @article{OpenStereo, title={OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline}, author={Guo, Xianda and ...
Marble is a modular, configuration-driven suite for training, evaluating, and performing inference on state-of-the-art pretrained music models. It leverages LightningCLI to provide easy extensibility ...
Deeksha Moodasarige Shama, Johns Hopkins University, USA. BRAIN SIGNALS TO ACTION: Monitoring and Explaining User Cognitive Load with Foundation Models. Margarita Geleta, UC Berkeley, USA. Spatial ...
The analysis, processing, and extraction of meaningful information from sounds all around us is the subject of the broader area of audio analytics. Audio captioning is a recent addition to the domain ...