Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...
Abstract: The generalization of the end-to-end deep reinforcement learning (DRL) for object-goal visual navigation is a long-standing challenge since object classes and placements vary in new test ...
Note: This model has been trained for approximately 2.7M steps (batch size = 1) and is still in the training process. I have attached a .ipynb file in the repository. You can refer to it to know how ...
MSVMamba is a visual state space model that introduces a hierarchy in hierarchy design to the VMamba model. This repository contains the code for training and evaluating MSVMamba models on the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results