Exploring the limits of decoder-only models trained on public speech recognition corpora
- Ankit Gupta
- George Saon
- et al.
- 2024
- INTERSPEECH 2024
George Saon received his M.Sc. and PhD degrees in Computer Science from Henri Poincare University in Nancy, France in 1994 and 1997. In 1995, Dr. Saon obtained his engineer diploma from the Polytechnic University of Bucharest, Romania. From 1994 to 1998, he worked on two-dimensional stochastic models for off-line handwriting recognition at the Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA). Since 1998, Dr. Saon is with the IBM T.J. Watson Research Center where he worked on a variety of problems spanning several areas of large vocabulary continuous speech recognition such as discriminative feature processing, acoustic modeling, speaker adaptation and large vocabulary decoding algorithms. Some of the techniques that he co-invented are well known to the speech community like heteroscedastic discriminant analysis (HDA), lattice-MLLR, fast FSM-based Viterbi decoding, i-vector speaker adaptation for DNNs, joint CNN/DNN training etc. Since 2001, Dr. Saon has been a key member of IBM's speech recognition team which participated in several U.S. government-sponsored evaluations for the EARS, SPINE, GALE, RATS and BOLT programs. He has published over 150 conference and journal papers and holds several patents in the field of ASR. He is the recipient of three best paper awards (EARS RT'04, INTERSPEECH 2010, ASRU 2011) and has served as an elected member of the IEEE Speech and Language Technical Committee.