There are two main approaches to voice recognition: speaker-dependent and speaker-independent. Speaker-depending systems are common in personal electronic devices such as cellular telephones. Speaker-dependent systems use a training mode to capture phonetic waveforms of a single speaker. These phonetic waveforms are evaluated, processed, and matched to words in a speech recognition dictionary in the form of a sequence of waveform parameters. The result is a voice recognition system that is unique to the single speaker; a speaker-dependent voice recognition will not work well for someone other than that single speaker. Speaker-dependent voice recognition systems are sensitive and, although they have very high accuracy rates under ideal conditions, they are adversely affected by background noise, coughing, a strained voice, etc. Another drawback to a speaker-dependent voice recognition system is that words that do not follow standard pronunciation rules, such as proper names, must be individually trained—in addition to the standard training mode.
On the other hand, speaker-independent voice recognition systems are common in dictation systems, automated directory assistance, automated phone banking, and voice-command devices. Speaker-independent systems use dictionaries with transcriptions created by professional linguists to match a particular speech utterance to a word. Because recognition is based on transcriptions rather than waveforms, speaker-independent voice recognition systems have a slightly lower accuracy rate than speaker-dependent systems. Speaker-independent voice recognition systems, however, are generally more robust than speaker-dependent voice recognition systems, can recognize the same word even when spoken by different speakers, and can more accurately recognize speech utterances in the presence of background noise.
Each word in a speaker-independent voice recognition system has at least one transcription, and sophisticated speaker-independent voice recognition systems use multiple-pronunciation models to account for alternate pronunciations of words. For example, U.S. dictionaries acknowledge the two common pronunciations of the word “Caribbean” as “k{hacek over (a)}r′□-bē′□n” or “k□-r{hacek over (i)}b′ē-□n.” These two pronunciations can be mapped to two transcriptions in the dictionary in a speaker-independent voice recognition system. Not only can multiple-pronunciation models account for standard single-language pronunciation alternates, but some multiple-pronunciation models also account for non-native accents, regional dialects, and personalized vocabularies. For personalized vocabularies such as proper names which do not follow standard pronunciation rules, a multiple-pronunciation generation model can automatically produce many alternate transcriptions. Thus, to increase of the coverage, there can be up to a dozen speaker-independent transcriptions for a single word in a multiple-pronunciation model environment.
A drawback to speaker-independent voice recognition systems with multiple-pronunciation models is that more transcriptions requires more memory and more processing power to recognize a particular speech utterance. In a portable electronic device, a speaker-independent voice recognition system with multiple-pronunciation models can use considerable processing power which can translate into battery drain and/or a noticeable lag in recognition speed. Moreover, this also can lead to an increase in confusion between words in the speech recognition dictionary.
Thus, there is an opportunity to move speaker-independent voice-recognition systems from a centralized system, such as an automated directory assistance system, to an individualized system such as in a portable electronic device. There is also an opportunity to improve upon speaker-independent voice recognition systems with multiple-pronunciation models to increase the speed of recognition and reduce processing requirements, especially for proper names, while maintaining the benefits of robust voice recognition capabilities. The various aspects, features and advantages of the disclosure will become more fully apparent to those having ordinary skill in the art upon careful consideration of the following Drawings and accompanying Detailed Description.