Russell Bobbitt, Jonathan Connell, et al.
WACV 2011
In recent years, automatic recognition of spoken languages has become an important feature in a variety of speech-enabled multilingual applications which, besides accuracy, also demand for efficient and "linguistically scalable" algorithms. This paper deals with a particularly successful approach based on phonotactic-acoustic features and presents systems for language identification as well as for unknown-language rejection. An architecture with multipath decoding, improved phonotactic models using binary-tree structures, and acoustic pronunciation models serve as a framework for experiments and discussion on these two tasks. In particular, language identification accuracy on a telephone-speech task (NIST'95 evaluation) in six and nine languages is presented together with results from a perceptual experiment carried out with human listeners. The performance of language rejection based on phonotactic modeling combined with a monolingual LVCSR system in the domain of broadcast news transcription is also reported. Besides yielding state-of-the-art performance, the described systems are computationally inexpensive and easily extensible (scalable) to new languages without the need for linguistic experts.
Russell Bobbitt, Jonathan Connell, et al.
WACV 2011
Rahul Garg, Rohit Khandekar
ICML 2009
Opher Etzion
DEBS 2007
Xiaodan Song, Ching-Yung Lin, et al.
CVPRW 2004