The present invention relates to the field of speech recognition. In particular the present invention relates to a system and method for on-line unsupervised adaptation in speaker verification.
Natural language speaker verification systems are currently in use for responding to various forms of commerce via a telephone network. One example of such a system is utilized in conjunction with a stock brokerage. According to this system, once a caller""s voice has been authenticated, the caller may obtain a quotation for the price of a particular stock issue, purchase or sell a particular number of shares at market price or a predetermined target price among other types of transactions. Natural language systems can also be used to respond to such things as requests for telephone directory assistance.
One of the most significant sources of performance degradation in a speaker verification system is the acoustic mismatch between the enrollment and subsequent verification sessions. Acoustic mismatches may occur as a result of differences in transducers, acoustic environment, and communication channel characteristics (e.g., varying channels associated with combinations of different subnetworks utilized in a telephone call). Of the factors contributing to acoustic mismatch in telephony applications, it has been shown that the mismatch in transducers of telephone handsets is the most dominant source of performance degradation.
To address the acoustic mismatch problem, a variety of approaches for robust speaker recognition have been developed in the past several years. These approaches include robust feature, model, and score-based normalization techniques. These approaches use off-line development data to compensate for the effects of acoustic mismatch that will be present when the system is used on-line.
Another approach has been developed that uses on-line unsupervised adaptation to xe2x80x9clearnxe2x80x9d the unseen channel characteristics automatically while the system is being used in the field. Unsupervised systems do not require human intervention during the verification process. Compared to off-line adaptation approaches, on-line approaches provides significantly more data for parameter estimation than typically available to the speaker verification system, facilitating more sophisticated modeling approaches and automated parameter tuning. Furthermore, rather than predicting the effects of acoustic mismatch with development data, the effects can be observed directly from this additional data.
Prior approaches to on-line unsupervised adaptation suffered from numerous limitations. For example, adaptation of the speaker model suffered negative effects from impostor attacks, it significantly increased the size of the speaker model, and it degraded the performance on the enrollment handset-type when adapting on new handset types.
The present invention introduces a system and method for unsupervised, on-line, adaptation in speaker verification. In one embodiment, a method for adapting a speaker model to improve the verification of a speaker""s voice, comprises detecting a channel of a verification utterance; learning vocal characteristics of the speaker on the detected channel; and transforming the learned vocal characteristics of the speaker from the detected channel to the speaker model of a second channel.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description, which follows.