Speech recognition is known in the art. Limited vocabulary speech recognizers operate by matching the incoming speech to a collection of reference speech models and selecting the reference model(s) which best match(es) the incoming speech. Many speech recognition systems operate with a collection of reference words created from a large number of speakers. However, since the user may have his own way of pronouncing certain words, many speech recognition systems also have adaptation systems which adapt the reference models to more closely match the users' way of speaking
During an adaptation session, the system displays the words the user should say and records how the user says each word. This is known as a “supervised” adaptation process since the speech recognizer knows the word the user will say. The speech recognizer then adapts its reference models to incorporate the user's particular way of saying the words. Once the adaptation session has finished, the system is ready to recognize any word which the user may decide to say.
Speech recognizers are typically limited to a particular vocabulary set. By limiting the vocabulary set, the recognizer will have a high level of recognition. One common vocabulary set is the set of digits.
Unfortunately, some digits have two or more ways of saying them. For example, in English, one can say “zero” or “oh” for the digit “0”. In German, the digit “2” is pronounced “zwei” or “zwo” and in Chinese there are digits with up to four different pronunciations.
In order to properly recognize the digit, the speech recognition system has models for each of the possible names of the digits and adapts its models for each of the digits and for their multiple names. During adaptation, the word to be said is shown to the user and the user is asked to pronounce it. For digits, this may be done in a number of ways. Usually, the digits may be presented as a string of numbers. If the digits are to be used for digit dialing, it may be desirable to present the numbers in phone number format. However, this is difficult for digits since some of them are single word digits and others are multi-word digits. For example, the phone number 03-642-7242 has a “0” which is a multi-word digit in English and many “2”s, which is a multi-word digit in German.
FIG. 1, to which reference is now made, shows one example of how the above phone number might be presented to a user for pronouncing during adaptation. For an English speaking user, the following might be displayed:
“zero 3-642-7242”
If the same number was to be used for a German speaker, the same phone number might be displayed as follows:
“03-64 zwei-7 zwei 4 zwei”
These presentations are uncomfortable for users as they are not used to seeing their digits written out in full. Because of this confusion, the user might not pronounce the digit sufficiently close to the way s/he pronounces it normally and thus, the adaptation will be poor.
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.