The present invention relates generally to speech recognition systems and speech training systems. More particularly, the invention relates to a speech recognition apparatus having an adaptation system employing eigenvoice basis vectors to rapidly adapt the initial speech model to that of the user. The system further employs a confidence measuring technique whereby the system automatically bases its adaptation upon utterances recognized with high confidence, while ignoring utterances recognized with low confidence. In this way, the system automatically adapts to the user quite rapidly, increasing the recognizer's chance of having good recognition performance, without adapting to incorrect pronunciations. The system is thus useful with difficult speakers, such as children or foreign speakers.
Those who have used present day continuous speech recognition systems are acquainted with the time consuming and rigorous process by which the speech model of the recognizer is adapted to the individual user's speech. For most adult users who are already fluent in speaking the language, the adaptation process simply involves the discipline of providing sufficient examples of that user's speech so that the initially supplied speaker-independent speech model can be adapted into a speaker-dependent model for that speaker. The adaptation process can be supervised, in which the user speaks words, phrases or sentences that are known in advance by the recognition system. Alternatively, the adaptation process can be unsupervised, in which case the user speaks into the system without the system having a priori knowledge of the speech content.
Adapting the speech recognition system to speech provided by children or foreign speakers who do not fluently speak the language is considerably more difficult. Speech recognition systems have great difficulty correctly processing and recognizing the speech of children and foreign speakers, in part because the speech models of present day recognizers are trained upon a corpus of native-speaking adult speakers. There is simply very little data available for children and foreign speakers.
in addition to poor recognition, speech recognizers have difficulty with children and foreign speakers because interaction with these types of users is very difficult. Children between the ages of four and seven generally have a difficult time concentrating on the task of training the recognizer. Children become distracted easily and cannot be relied upon to follow the adaptation procedures correctly. Indeed, this difficulty in obtaining speech data from children is one reason why the corpus of speech data from children is so small.
Foreign speakers present a similar problem. Although adult foreign speakers are able to concentrate on the adaptation task, like children, they may be unable to read training scripts used for adaptation, and they may mispronounce so many words that the adapted speech model will fail to properly recognize subsequent speech.
The present invention addresses the foregoing problems by providing a speech recognition apparatus that will adapt the initial speech model using a highly effective and rapid adaptation system that will automatically assess the quality or accuracy of the user's pronunciation, using only the high confidence utterances for adaptation. The adaptation system uses a priori knowledge about the class of speakers for which the application is intended to adapt to the user's voice with only a very limited amount of adaptation data.
More specifically, the adaptation system is based on a speaker space representation of the class of speakers. A plurality of training speakers is used to generate speech models that are then dimensionally reduced to generate a set of basis vectors that define an eigenspace. During the adaptation process, speech units uttered by the user are used to train the adapted speech model, while the space spanned by the basis vectors is used to constrain the adapted speech model to lie within the eigenspace. As more fully described below, we have discovered that this eigenvoice technique of encoding a priori knowledge about the target user population achieves remarkably rapid adaptation, even when very little adaptation data is provided. The system is able to begin performing adaptation almost as quickly as the user begins speaking. Once the speaker has provided an utterance that the confidence measurement system admits as reliable, the speech model associated with that utterance may be placed or projected into the eigenspace, thereby establishing an adapted speech model that is constrained to the class of speakers for which the application is intended.
The speech recognition apparatus has many uses and makes possible a number of interesting applications that have heretofore been difficult to achieve. One example is the computer based teaching system that guides children or foreign speakers in the correct pronunciation of new words within the language. In a system suitable for children, a simple supervised adaptation session can commence by prompting the child simply to state his or her name. The system may have a priori knowledge of the child's name by spelled name keyboard entry.
In a language teaching system the confidence measure can also be used to query the user on words that are not confidently recognized. The teaching system may include a speech playback system containing speech data representing prerecorded speech. This data can supply proper pronunciation of words as part of the query, thereby seeking user verification of a potentially misunderstood word, while at the same time pronouncing the word correctly for the user to hear.
While the speech recognition apparatus of the invention is highly useful in language teaching systems, the rapid adaptation system coupled with the confidence measure renders the recognizer quite useful in other applications where adaptation is difficult. These applications include telephone call routing and speech-enabled marketing systems where rapid and reliable speaker adaptation is needed almost from the instant the speaker begins speaking.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.