Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and UnderstandingVishal SunderSamuel Thomaset al.2023ICASSP 2023
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech RecognitionSamuel ThomasHong-Kwang J. Kuoet al.2023ICASSP 2023
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video RetrievalAndrew RouditchenkoYung-Sung Chuanget al.2023ICASSP 2023
VQ-T: RNN Transducers using Vector-Quantized Prediction Network StatesJiatong ShiGeorge Saonet al.2022INTERSPEECH 2022
Global RNN Transducer Models For Multi-dialect Speech RecognitionTakashi FukudaSamuel Thomaset al.2022INTERSPEECH 2022
Everything at Once - Multi-modal Fusion Transformer for Video RetrievalNina ShvetsovaBrian Chenet al.2022CVPR 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR ModelsSamuel ThomasBrian Kingsburyet al.2022ICASSP 2022
Decentralized Bilevel Optimization for Personalized Client LearningSongtao LuXiaodong Cuiet al.2022ICASSP 2022
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasetsZvi KonsAharon Sattet al.2022ICASSP 2022
Integrating dialog history into end-to-end spoken language understanding systemsJatin GanhotraSamuel Thomaset al.2021INTERSPEECH 2021