Samuel Thomas

Title

Senior Research Scientist - Speech Recognition and Spoken Language Understanding

Bio

Samuel Thomas received his B.Tech degree in Computer Engineering from the Cochin University of Science and Technology, India and M.S degree in Computer Science and Engineering from the Indian Institute of Technology Madras, India before earning his Doctor of Philosophy degree from the Johns Hopkins University, Baltimore. Since graduation, he has been at the IBM T.J. Watson Research Center, New York with the Speech Technologies Group. In the past, he has worked on several speech research projects and workshops with the Center for Language and Speech Processing (CLSP) at JHU, the Idiap Research Institute, Switzerland and the TeNeT group, IIT Madras. His research interests include speech processing and machine learning for speech recognition, spoken language understanding, speech synthesis and speaker recognition. Samuel is an IBM Master Inventor, a Senior Member of the IEEE and also an Associate Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing. He is also an elected member of the IEEE Speech and Language Technical Committee (SLTC).

Publications

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
- - Andrew Rouditchenko
  - Yung-Sung Chuang
  - et al.
- 2023
- ICASSP 2023
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition
- - Samuel Thomas
  - Hong-Kwang J. Kuo
  - et al.
- 2023
- ICASSP 2023
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding
- - Vishal Sunder
  - Samuel Thomas
  - et al.
- 2023
- ICASSP 2023
Global RNN Transducer Models For Multi-dialect Speech Recognition
- - Takashi Fukuda
  - Samuel Thomas
  - et al.
- 2022
- INTERSPEECH 2022
Extending RNN-T-based speech recognition systems with emotion and language classification
- - Zvi Kons
  - Hagai Aronowitz
  - et al.
- 2022
- INTERSPEECH 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
- - Vishal Sunder
  - Eric Fosler-Lussier
  - et al.
- 2022
- INTERSPEECH 2022
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval
- - Nina Shvetsova
  - Brian Chen
  - et al.
- 2022
- CVPR 2022
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
- - Samuel Thomas
  - Jeff Kuo
  - et al.
- 2022
- ICASSP 2022
Towards End-to-end Integration of Dialog History For Improved Spoken Language Understanding
- - Vishal Sunder
  - Samuel Thomas
  - et al.
- 2022
- ICASSP 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
- - Samuel Thomas
  - Brian Kingsbury
  - et al.
- 2022
- ICASSP 2022

Visit Google Scholar

Top collaborators

Samuel Thomas

Title

Bio

Publications

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition

Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding

Global RNN Transducer Models For Multi-dialect Speech Recognition

Extending RNN-T-based speech recognition systems with emotion and language classification

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Everything at Once - Multi-modal Fusion Transformer for Video Retrieval

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Towards End-to-end Integration of Dialog History For Improved Spoken Language Understanding

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Patents

Textual Knowledge Transfer For Improved Speech Recognition And Understanding

Multi-speaker Data Augmentation For Improved End-to-end Automatic Speech Recognition

Multilingual Intent Recognition

Transliteration Based Data Augmentation For Training Multilingual Asr Acoustic Models In Low Resource Settings

Transliteration Based Data Augmentation For Training Multilingual Asr Acoustic Models In Low Resource Settings

Multilingual Intent Recognition

Using Closed Captions As Parallel Training Data For Customization Of Closed Captioning Systems

Detecting And Recovering Out-of-vocabulary Words In Voice-to-text Transcription Systems

Integrating Dialog History Into End-to-end Spoken Language Understanding Systems

Multi-modal Lung Capacity Measurement For Respiratory Illness Prediction

Top collaborators

Rogerio Feris

Brian Kingsbury

Takashi Fukuda

Gakuto Kurata