Segment pre-selection in decision-tree based speech synthesis systems
Abstract
Corpus based approaches to unit selection for concatenative speech synthesis have become popular in recent years due to their improved sensitivity to unit context over their more simple predecessors. These systems usually make use of large speech databases and employ sophisticated search algorithms to determine the optimal unit sequence to use to synthesise each sentence. For many applications it is not possible to have the entire database, which may be as large as several hundred megabytes, available to the synthesiser at runtime. What is required is some form of off-line pre-selection algorithm to determine which subset of the database enables the highest quality speech synthesis to be performed for a given runtime system size. This paper describes a pre-selection algorithm developed at IBM for use with decision-tree-based concatenative speech synthesisers.