data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 2022 Audio Continuous WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing 2021 ...
More and more large multimodal models (LMMs) are being released from time to time, but the finetuning of these models is not always straightforward. This codebase aims to provide a unified, minimal ...