View all topics

Speech

As more of the world moves online, the demand for systems that can understand users and speak to them in natural language is growing exponentially. At IBM Research, we're working on next-generation AI that learns to decipher and replicate the way humans speak.

Our work

Training a customer service bot to sound more human
Research
Kim Martineau
21 Sep 2022
Converting several audio streams into one voice makes it easier for AI to learn
Research
Kim Martineau
30 Aug 2022
The pandemic changed the way we understand speech
Research
Rachel Ostrand
11 Aug 2022
- Science
- Speech
Austin or Boston? Making artificial speech more expressive, natural, and controllable
Research
Slava Shechtman, Raul Fernandez, and David Haws
26 Apr 2021
8 minute read
- AI
- Speech
Speech-to-text AI could help doctors prescribe placebo to ease chronic pain
Research
Sara Berger
22 Mar 2021
6 minute read
A cognitive in-car companion to help us enjoy the journey
Research
09 Feb 2017
4 minute read
- AI
- Speech

Publications

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
- - Andrew Rouditchenko
  - Saurabhchand Bhati
  - et al.
- 2025
- ASRU 2025
UniAVLM: Unified Large Audio-Visual Language Models for Comprehensive Video Understanding
- - Lecheng Yan
  - Chenyang Lyu
  - et al.
- 2025
- PRICAI 2025
Automatically Calculated Context-Sensitive Features of Connected Speech Improve Prediction of Impairment in Alzheimer's Disease
- - Graham Flick
  - Rachel Ostrand
- 2025
- J. Speech Lang. Hear. Res.
SKIP-SALSA: Skip Synchronous Fusion of ASR LLM Decoders
- - Ashish Mittal
  - Darshan Prabhu
  - et al.
- 2025
- INTERSPEECH 2025
Voice Activity-based Text Segmentation for ASR Text Denormalization
- - Sashi Novitasari
  - Takashi Fukuda
  - et al.
- 2025
- INTERSPEECH 2025
Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues
- - Sashi Novitasari
  - Takashi Fukuda
  - et al.
- 2025
- INTERSPEECH 2025

View all publications