Abstract: Referring Multi-Object Tracking (RMOT) aims to dynamically track an arbitrary number of referred targets in a video sequence according to the language expression. Previous methods mainly ...
Abstract: The essence of audio-visual segmentation (AVS) lies in locating and delineating sound-emitting objects within a video stream. While Transformer-based methods have shown promise, their ...