Thomas Erickson, Catalina M. Danis, et al.
ACM CSCW 2008
Most modern speech synthesis systems using context dependent decision trees in their acoustic synthesis modules are unit selection style concatenative speech synthesis systems using the trees essentially as a form of pruning during their segment search. The IBM Trainable Speech Synthesis System is one such system. This paper begins by discussing the advantages and disadvantages of the decision tree and non-decision tree approaches to unit selection synthesis. It goes on to present the results of formal listening tests conducted on the IBM system to investigate a number of different topics pertinent to decision tree based systems. These include the use of extended context features during clustering, the effect of using trees with different numbers of leaves and different numbers of segments per leaf, and the performance of several different offline segment preselection algorithms.
Thomas Erickson, Catalina M. Danis, et al.
ACM CSCW 2008
Shumin Zhai, Per-Ola Kristensson
CHI 2003
Theodore Kim, Mark Carlson
SCA 2007
Sharon L. Greene, Tracy Lou, et al.
CHI EA 2005