AVLnet: Learning audio-visual language representations from instructional videosAndrew RouditchenkoAngie Boggustet al.2021INTERSPEECH 2021
Cascaded multilingual audio-visual learning from videosAndrew RouditchenkoAngie Boggustet al.2021INTERSPEECH 2021