Typically, in speaker recognition systems, a sample of the voice properties of a target speaker is taken and a corresponding voice print model is built. In order to improve system robustness against impostors in a “verification” mode, it is also typical for a large number of non-target speakers (i.e., “background speakers”) to be analyzed, pre-stored, and then used to normalize the voice-print likelihood score of the target speakers.
The voice analysis can be conducted at various levels of phonetic detail, ranging from global (phoneme-independent) models to fine phonemic or subphonemic levels. With several such levels in a system, a problem arises as to how to combine scores from different levels. Combining scores from different levels may be important since it may not always be possible to obtain data at the phonemic level. Particularly, while it is recognized that the voice patterns of a speaker vary with phonemes (or sounds), and are thus better distinguished by models that are created for individual phonemes, it is sometimes the case that the training data will be sparse. In this case, not all of the phoneme models can be created in a robust way (i.e., in terms of statistical robustness) and therefore have to be combined with models created on a higher level of coarseness (or granularity), such as on broad classes of phonemes (vowels, plosives, fricatives etc.) or on phoneme-independent models, whose robustness is higher. Conventionally, this combination is achieved as a linear interpolation of the model scores from individual granularity levels in a method known as the “back-off” method. A discussion of the “back-off” method can be found in F. Jelinek, “Statistical Methods for Speech Recognition” (MIT Press 1998, ISBN 0262100665). However, this method, as well as other conventional methods, have often been found to be inadequate in providing effective speech verification capabilities.
Accordingly, a need has been recognized in connection with providing a system that adequately and effectively combines scores from the individual levels while avoiding other shortcomings and disadvantages associated with conventional arrangements.