Modeling speaking rate for voice fonts
Abstract
Voice fonts are created and stored for a speaker, to be used to synthesize speech in the speaker?s voice. The most important descriptors of voice fonts are spectral envelope for acoustic units and prosodic features such as fundamental frequency and average speaking rate. In this paper, we present a new approach to model the speaking rate so that it can be easily incorporated in voice fonts and used for personality transformation. We model speaking rate in the form of average duration for various acoustic units and categories for the speaker. The speaking rate can be automatically extracted from a speech corpus in the speaker?s voice using the proposed approach. We show how the proposed approach can be implemented, and present its performance evaluation through various subjective tests.