Gakuto Kurata

Title

Distinguished Engineer and Chief Scientist for Spoken Conversational Systems
Gakuto Kurata

Bio

Gakuto Kurata is Distinguished Engineer and Chief Scientist for Spoken Conversational Systems at IBM Research. He currently leads the global research and development activities to advance Watson Speech-to-Text. He is the senior manager of AI technologies and driving speech and language research at IBM Research - Tokyo. His teams are closely collaborating with IBM's global research teams in the US, Israel, and India and streamlining research outcomes to IBM's product and solutions.

He joined IBM in April 2004, after obtaining M.S. in Information Science and Technology from the University of Tokyo. He received a Ph.D. in Information Science and Technology from the University of Tokyo in 2013. He has been the Technical Assistant to the Director of IBM Research - Tokyo in 2014. He is an IBM Master Inventor and a member of IBM Academy of Technology. He is an elected member of the IEEE SLTC (Speech and Language Processing Technical Committee). He has more than 10 years of research and development experiences in speech technology, natural language processing, and their combinations.

Publications

Conference Papers

  • Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon, "Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems", in Proceedings of INTERSPEECH 2022, Incheon, Korea, September 2022.
  • Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, "Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing", in Proceedings of INTERSPEECH 2022, Incheon, Korea, September 2022.
  • Takashi Fukuda, Samuel Thomas, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, "Global RNN Transducer Models For Multi-dialect Speech Recognition", in Proceedings of INTERSPEECH 2022, Incheon, Korea, September 2022.
  • Sashi Novitasari, Takashi Fukuda, Gakuto Kurata, "Improving ASR Robustness in Noisy Condition Through VAD Integration", in Proceedings of INTERSPEECH 2022, Incheon, Korea, September 2022.
  • Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltan Tuske, 'Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio', in Proceedings of INTERSPEECH 2021, Brno, Czech Republic (Online), August 2021.
  • Samuel Thomas, Hong-Kwang Kuo, George Saon, Zoltan Tuske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 'RNN Transducer Models for Spoken Language Understanding', in Proceedings of ICASSP 2021, Toronto, Canada (Online), June 2021.
  • Takashi Fukuda, Gakuto Kurata, 'Generalized Knowledge Distillation from an Ensemble of Specialized Teachers Leveraging Unsupervised Neural Clustering', in Proceedings of ICASSP 2021, Toronto, Canada (Online), June 2021.
  • Gakuto Kurata, George Saon, 'Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition', in Proceedings of INTERSPEECH 2020, Shanghai, China (Online), October 2020.
  • Hong-Kwang Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras, 'End-to-End Spoken Language Understanding Without Full Transcripts', in Proceedings of INTERSPEECH 2020, Shanghai, China (Online), October 2020.
  • Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory, 'New advances in speaker diarization', in Proceedings of INTERSPEECH 2020, Shanghai, China (Online), October 2020.
  • Shintaro Ando, Masaykui Suzuki, Nobuyasyu Itoh, Gakuto Kurata, Nobuaki Minematsu, “Converting written language to spoken language with neural machine translation for language modeling”, in Proceedings of ICASSP 2020, Barcelona, Spain (Online), May 2020
  • Yosuke Higuchi, Masayuki Suzuki, Gakuto Kurata,'Speaker embeddings incorporating acoustic conditions for diarization', in Proceedings of ICASSP 2020, Barcelona, Spain (Online), May 2020
  • Gakuto Kurata, Kartik Audhkhasi, 'Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation', in Proceedings of INTERSPEECH 2019, Graz, Austria, September 2019, Poster
  • Gakuto Kurata, Kartik Audhkhasi, 'Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition', in Proceedings of INTERSPEECH 2019, Graz, Austria, September 2019
  • Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, 'Direct Neuron-wise Fusion of Cognate Neural Networks', in Proceedings of INTERSPEECH 2019, Graz, Austria, September 2019
  • Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko, 'English Broadcast News Speech Recognition by Humans and Machines', in Proceedings of ICASSP 2019, Brighton, UK, May 2019
  • Masayuki Suzuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, Samuel Thomas, 'Improvements to N-gram Language Model Using Text Generated from Neural Language Model', in Proce__edings of ICASSP 2019, Brighton, UK, May 2019
  • Gakuto Kurata, Kartik Audhkhasi, 'Improved Knowledge Distillation from Bi-directional to Uni-directional LSTM CTC for End-to-end Speech Recognition', in Proceedings of SLT 2018, Athens, Greece, December 2018
  • Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata, 'Data Augmentation Improves Recognition of Foreign Accented Speech', in Proceedings of INTERSPEECH 2018, September 2018
  • Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas, 'Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models', in Proceedings of INTERSPEECH 2018, September 2018
  • Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy, 'Language Modeling with Highway LSTM', in Proceedings of ASRU 2017, Okinawa, Japan, December 2017
  • Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon, 'Empirical Exploration of Novel Architectures and Objectives for Language Models', in Proceedings of INTERSPEECH 2017, Stockholm, Sweden, August 2017
  • George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall, 'English Conversational Telephone Speech Recognition by Humans and Machines', in Proceedings of INTERSPEECH 2017, Stockholm, Sweden, August 2017
  • Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran, “Efficient knowledge distillation from an ensemble of teachers”, in Proceedings of INTERSPEECH 2017, Stockholm, Sweden, August 2017
  • Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura, “Ensemble of multi-scale VGG acoustic models”, in Proceedings of INTERSPEECH 2017, Stockholm, Sweden, August 2017
  • Masayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Ken Church, Mark Drake, “Symbol sequence search from telephone conversation', in Proceedings of INTERSPEECH 2017, Stockholm, Sweden, August 2017
  • Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie, “Factorial modeling for effective suppression of directional noise”, in Proceedings of INTERSPEECH 2017, Stockholm, Sweden, August 2017
  • Takashi FUKUDA, Osamu ICHIKAWA, Gakuto KURATA, Ryuki TACHIBANA, Samuel Thomas, Bhuvana Ramabhadran, 'Effective Joint Training of Denoising Feature Space Transforms and Nueral Network Based Acoustic Models', in Proceedings of ICASSP 2017, March 2017
  • Osamu ICHIKAWA, Takashi FUKUDA, Masayuki SUZUKI, Gakuto KURATA, Bhuvana Ramabhadran, 'Harmonic Feature Fusion for Robust Neural Network-based Acoustic Modeling', in Proceedings of ICASSP 2017, March 2017
  • Gakuto KURATA, Bing Xiang, Bowen Zhou, Mo Yu,'Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling', in Proceedings of EMNLP 2016, Austin, U.S.A., November 2016
  • Gakuto KURATA, Brian Kingsbury, 'Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling', in Proceedings of INTERSPEECH 2016, San Francisco, U.S.A., September 2016
  • Gakuto KURATA, Bing Xiang, Bowen Zhou, 'Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling', in Proceedings of INTERSPEECH 2016, San Francisco, U.S.A., September 2016
  • Gakuto KURATA, Bing Xiang, Bowen Zhou, 'Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence', in Proceedings of NAACL/HLT 2016, San Diego, U.S.A., June 2016
  • Gakuto KURATA, Daniel Willett, 'Deep Neural Network Training Emphasizing Central Frames', in Proceedings of INTERSPEECH 2015, Dresden, Germany, September 2015
  • Masayuki SUZUKI, Gakuto KURATA, Tohru NAGANO, Ryuki TACHIBANA, 'Speech Recognition Robust Against Speech Overlapping in Monaural Recordings of Telephone Conversations', in Proceedings of ICASSP 2016, March 2016
  • Nobuyasu ITOH, Gakuto KURATA, Ryuki TACHIBANA, Masafumi NISHIMURA,  'A Metric for Evaluating Speech Recognizer Output Based on Human-perception Model', in Proceedings of INTERSPEECH 2015, September 2015
  • Masayuki SUZUKI, Gakuto KURATA, Masafumi NISHIMURA, Nobuaki MINEMATSU, 'Discriminative Reranking for LVCSR Leveraging Invariant Structure', in Proceedings of INTERSPEECH 2012, September 2011
  • Masayuki SUZUKI, Gakuto KURATA, Masafumi NISHIMURA, Nobuaki MINEMATSU, 'Continuous Digits Recognition Leveraging Invariant Structure', in Proceedings of INTERSPEECH 2011, pp.993-996, Florence, Italy, August 2011
  • Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA, 'Acoustic Model Training with Detecting Transcription Errors in the Training Data', in Proceedings of INTERSPEECH 2011, pp.1689-1692, Florence, Italy, August 2011
  • Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA, Abhinav Sethy, Bhuvana Ramabhadran, 'Named Entity Recognition from Conversational Telephone Speech Leveraging Word Confusion Networks for Training and Recognition', in Proceedings of ICASSP 2011, pp.5576-5579, Prague, Czech Republic, May 2011
  • Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA, 'Training of Error-corrective Model for ASR without Using Audio Data', in Proceedings of ICASSP 2011, pp.5572-5575, Prague, Czech Republic, May 2011
  • Gakuto KURATA, Osamu ICHIKAWA, Masafumi NISHIMURA,  'Speech Input Method in Automobiles Reflecting Analysis on How Users Speak', The IEICE transactions on information and systems, Vol.J93-D, No.10, pp.2107-2117, October 2010
  • Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA, 'Acoustically Discriminative Training for Language Models', in Proceedings of ICASSP 2009, pp.4717-4720, Taipei, Taiwan, April 2009
  • Ryuki TACHIBANA, Tohru NAGANO, Gakuto KURATA, Masafumi NISHIMURA, Noboru BABAGUCHI,  'Preliminary Experiments toward Automatic Generation of New TTS Voices from Recorded Speech Alone', in Proceedings of INTERSPEECH 2007, Antwerp, Belgium, August 2007
  • Gakuto KURATA, Shinsuke MORI, Nobuyasu ITOH, Masafumi NISHIMURA, 'Unsupervised Lexicon Acquisition from Speech and Text', in Proceedings of ICASSP 2007, Vol.4, pp.421-424, Honolulu, U.S.A, April 2007
  • Shinsuke MORI, Daisuke TAKUMA, Gakuto KURATA, 'Phoneme-to-Text Transcription System with an Infinite Vocabulary', in Proceedings of COLING-ACL 2006, Sydney, Australia, July 2006
  • Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA, 'Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus', in Proceedings of ICASSP 2006, Vol.1, pp.1037-1040, Toulouse, France, May 2006
  • Shinsuke MORI, Gakuto KURATA, 'Class-based Variable Memory Length Markov Model', in Proceedings of INTERSPEECH 2005, pp.13-16, Lisbon, Portugal, July 2005
  • Gakuto KURATA, Naoaki OKAZAKI, Mitsuru ISHIZUKA, 'GDQA: Graph Driven Question Answering System - NTCIR-4 QAC2 Experiments -', in Working Notes of NTCIR-4, Tokyo, Japan, June 2004
  • Nobuaki MINEMATSU, Gakuto KURATA, Keikichi HIROSE, 'Corpus-based analysis of production and perception of Japanese English in view of the entire phonemic system of English,' in Proceedings of ICPhS, pp.1569-1572, August 2003
  • Nobuaki MINEMATSU, Gakuto KURATA, Keikichi HIROSE, 'Integration of MLLR Adaptation with Pronunciation Proficiency Adaptation for Non-Native Speech Recognition', in Proceedings of ICSLP 2002, Denver, U.S.A., September 2002
  • Nobuaki MINEMATSU, Gakuto KURATA, Keikichi HIROSE, 'Corpus-Based Analysis of English Spoken by Japanese Students in View of the Entire Phonemic System of English', in Proceedings of ICSLP 2002, Denver, U.S.A., September 2002

Journal Papers

  • Tohru NAGANO, Issei YOSHIDA, Yoshinori KABEYA, Isao OKAHARA, Gakuto KURATA, Ryuki TACHIBANA, 'Real-time Agent Supporting System using Speech Recognition in Contact Center', The IEICE transactions on information and systems, Vol. J102-D, No.9, pp597-608, September 2019
  • Gakuto KURATA, Daniel Willett, 'Deep Neural Network Training Emphasizing Central Frames for Speech Recognition', IPSJ Journal, Vol.58, No.5, pp.1207-1217, May 2017
  • Masayuki SUZUKI, Gakuto KURATA, Masafumi NISHIMURA, Nobuaki MINEMATSU, 'Discriminative re-ranking for automatic speech recognition by leveraging invariant structures', Speech Communication, Vol.72, Issue 3, pp.208-217, September 2015
  • Tohru NAGANO, Gakuto KURATA, Masayuki SUZUKI, Ryuki TACHIBANA, Masafumi NISHIMURA, 'Improvement of Spoken Term Detection by Combining LVCSR and Syllable-based N-best Speech Recognition Results', IPSJ Journal, Vol.56, No.8, pp.1646-1656, August 2015
  • Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA, Abhinav Sethy, Bhuvana Ramabhadran, 'Leveraging Word Confusion Networks for Named Entity Modeling and Detection from Conversational Telephone Speech', Speech Communication, Vol.54, Issue 3, pp.491-502, March 2012
  • Gakuto KURATA, Abhinav Sethy, Bhuvana Ramabhadran, Ariya Rastrow, Nobuyasu ITOH, Masafumi NISHIMURA, 'Acoustically Discriminative Language Model Training with Pseudo-hypothesis', Speech Communication, Vol.54, Issue 2, pp.219-228, February 2012
  • Gakuto KURATA, Osamu ICHIKAWA, Masafumi NISHIMURA, 'Speech Input Method in Automobiles Reflecting Analysis on How Users Speak', The IEICE transactions on information and systems, Vol. J93-D, No.10, pp.2107-2117, October 2010
  • Gakuto KURATA, Shinsuke MORI, Nobuyasu ITOH, Masafumi NISHIMURA, 'Unsupervised Construction of Speech Recognition Lexicon from Speech and Text', IPSJ Journal, Vol.49, No.8, pp.2900-2909, August 2008
  • Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA, 'Unsupervised Adaptation of a Speech Recognition System Using a Lecture-Related Corpus', The IEICE transactions on information and systems, Vol.J90-D, No.9, pp.2530-2540, September 2007
  • Ryuki TACHIBANA, Tohru NAGANO, Gakuto KURATA, Masafumi NISHIMURA, Noboru BABAGUCHI, 'Automatic Prosody Labeling using Multiple Models for Japanese', The IEICE transactions on information and systems, Vol.E90-D, No.11, pp.1805-1812, November 2007
  • Shinsuke MORI, Daisuke TAKUMA, Gakuto KURATA, 'Word N-gram Probability Calculation from a Stochastically Segmented Corpus',
    IPSJ Journal, Vol.48, No.2, pp.892-899, February 2007
  • Nobuaki MINEMATSU, Gakuto KURATA, Keikichi HIROSE, 'Corpus-based Statistical Analysis of Production and Perception of Japanese English in View of Phonemic and Lexical Structure of American English', Journal of the Phonetic Society of Japan, Vol.7, No.3, pp.77-91, December 2003

Chapter in Book

  • Gakuto KURATA, 'Smoothing Techniques for n-gram Language Model', Encyclopedia of Natural Language Processing, pp.128-129, The Association for Natural Language Processing, December 2009
  • Masafumi NISHIMURA, Gakuto KURATA 'Recent Advances and Possibilities of Innovation in Speech Interface Technology', IPSJ Magazine, Vol.51, No.11, pp.1434-1439, Information Processing Society of Japan, December 2010

Domestic Conference Paper

  • Gakuto KURATA, Osamu ICHIKAWA, Masafumi NISHIMURA, 'Speech Input Method in Automobiles Reflecting Analysis on How Users Speak', IPSJ SIG Technical Report, SLP-78-2, October 2009
  • Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA, 'Large Vocabulary Continuous Speech Recognition with a Japanese Language Model from a Raw Corpus', IPSJ SIG Technical Report, SLP-57-19, July 2005
  • Gakuto KURATA, Naoaki OKAZAKI, Mitsuru ISHIZUKA, 'Question Answering System with Graph Structure from Dependency Analysis', IPSJ SIG Technical Report, NL-158-11, November 2003
  • Gakuto KURATA, Nobuaki MINEMATSU, Keikichi HIROSE, 'Improvement of Non-native Speech Recognition Using MLLR Adaptation with Respect to Pronunciation Proficiency', Technical Report of IEICE, SP2002-38, June 2002
  • Ryuki TACHIBANA, Tohru NAGANO, Gakuto KURATA, Masafumi NISHIMURA, Noboru BABAGUCHI, 'Automatic Accent Labeling for a Text-To-Speech System',  IPSJ SIG Technical Report, SLP-65-18, February 2007
  • Nobuaki MINEMATSU, Gakuto KURATA, Keikichi HIROSE,  'Corpus-based Analysis of Japanese Pronunciation of English in View of a Phonemic System of English', Technical Report of IEICE, SP2002-37, June 2002

Projects

Top collaborators

TF
Takashi Fukuda

Takashi Fukuda

Senior Technical Staff Member, Master Inventor - Audio, Speech, and Language Processing
ST
Samuel Thomas

Samuel Thomas

Senior Research Scientist - Speech Recognition and Spoken Language Understanding
HK
Hiroshi Kanayama

Hiroshi Kanayama

Senior Technical Staff Member, Knowledge Infrastructure, AI Technologies, IBM Research - Tokyo