Present-day speech recognition systems typically comprise a computing device including software and hardware to recognize spoken words. Such systems compare samples of one or more spoken words to samples stored within memory corresponding to words in a set of known words, referred to as a vocabulary.
With current technology, the likelihood that a word will be accurately recognized is inversely proportional to the number of words in the vocabulary. Existing speaker independent recognition systems begin to produce unreliable recognition results with vocabularies having more than about twenty words in the vocabulary. To address this problem, known systems may partition a large vocabulary into smaller vocabularies, only one of which is active at any one time. For example, a speaker independent system may be used to solicit answers to a series of questions. Each question typically only has a number of possible answers, all of which form part of a smaller vocabulary.
To compound recognition problems, vocabularies need to be "tuned" to handle regional dialects, accents and background noise. For example, English words spoken by a native of New York City typically sound different than those spoken by a native of London, England. These differences may further decrease the accuracy of recognition in a speaker independent system, as a single vocabulary can only accommodate a limited variance in the pronunciation of a spoken word.
Known systems address this problem by adapting the speech recognition vocabulary to include numerous variations of each word in the vocabulary. This, however, reduces the number of words that can be contained in a single vocabulary while still expecting reasonable speech recognition performance.
Other systems use completely separate vocabularies, each tuned to a specific dialect, accent or background noise. A suitable vocabulary is used depending on the location or application of a recognition system. For example, if a system is used in London, England a first speaker independent recognition vocabulary is used. If a similar system is used in New York City, a different vocabulary is used. Other known systems, as for example disclosed in U.S. Pat. No. 5,524,169 naming Cohen et al. as inventors, dynamically choose a vocabulary using external indicators of geographic origin of the speaker. For example, a signal derived using the global positioning system to determine the location of the speaker may be used. This approach may improve the recognition accuracy of the system, if an optimal vocabulary is correctly predicted.
This approach, however, is not effective if a speaker's dialect or accent cannot be predicted with any accuracy. This is a particular problem in an area where many dialects of the same language are spoken; in a multi-cultural area where people speak the same language but with varying accents; or in a location, such as an airport or hotel, frequented by travellers. Similarly, people travelling to foreign countries where the main populous speak with different accents, dialects other than their own may not be able to use the local speech recognition systems due to poor speech recognition performance.
Other known speech recognition systems prompt users for additional information to properly determine the speaker's dialect, accent or language. Alternatively, separately configured and selectable versions of the systems, each appropriate for a particular dialect, accent, language or environment (for example quiet/landline vs. noisy/cellular telephones) may be simultaneously available. They may, for example, be selected through the public switched telephone network using distinct telephone numbers.
Similar problems exist with speaker dependent speech recognition systems. Such systems have been "trained" by one or more persons to recognize a person's particular speech patterns. Vocabularies are formed based on system training. Accordingly, vocabularies formed by the system usually only produce accurate recognition results when used by the person who trained the vocabulary. Again, existing systems use external indicia in choosing a suitable vocabulary. For example, a particular computer acting as a recognition system typically utilizes login information in selecting the vocabulary used by the recognition system.
The present invention attempts to overcome some of the disadvantages of known speech recognition systems.