Domain adaptation for text dependent speaker verification
Abstract
Recently we have investigated the use of state-of-the-art textdependent speaker verification algorithms for user authentication and obtained satisfactory results mainly by using a fair amount of text-dependent development data from the target domain. In this work we investigate the ability to build high accuracy text-dependent systems using no data at all from the target domain. Instead of using target domain data, we use resources such as TIMIT, Switchboard, and NIST data. We introduce several techniques addressing both lexical mismatch and channel mismatch. These techniques include synthesizing a universal background model according to lexical content, automatic filtering of irrelevant phonetic content, exploiting information in residual supervectors (usually discarded in the i-vector framework), and inter dataset variability modeling. These techniques reduce verification error significantly, and also improve accuracy when target domain data is available.