Resource-Efficient and Cross-Modal Learning Toward Foundation Models

Pin-Yu Chen; Chao-han Huck Yang; Shalini Ghosh; Marcel Worring

INTERSPEECH 2023

Tutorial

20 Aug 2023

Resource-Efficient and Cross-Modal Learning Toward Foundation Models

View publication

Abstract

In this tutorial, the first session will introduce the theoretical advantages of large-scale pre-trained foundation models by the universal approximation theory and how to update the large-scale speech and acoustic models effectively using parameter-efficient learning. Next, our second session will introduce how we can do effective cross-modal pre-training of representations across visual, speech, and language modalities, which can be learned without necessarily needing aligned data across modalities and can benefit tasks in individual modalities as well. Finally, our third session will explore different applications on multimedia processing benefited from the pre-training of acoustic and language modelling with benchmark performance.

Conference paper