Shilei Zhang, Yong Qin
ICASSP 2012
Phonetic segmentation is an important step in the development of a concatenative TTS voice. This paper introduces a segmentation process consisting of two phases. First, forced alignment is performed using an HMM-GMM model. The resulting segmentation is then locally refined using an SVM based boundary model. Both the models are derived from multi-speaker data using a speaker adaptive training procedure. Evaluation results are obtained on the TIMIT corpus and on a proprietary single-speaker TTS corpus. © 2012 IEEE.
Shilei Zhang, Yong Qin
ICASSP 2012
John Z. Sun, Kush R. Varshney, et al.
ICASSP 2012
Asaf Rendel, Raul Fernandez, et al.
ICASSP 2016
Raul Fernandez, Asaf Rendel, et al.
ICASSP 2013