The present invention relates generally to the recognition of human speech, and more specifically to a system and method for low cost word recognition.
Many techniques have been developed to recognize spoken words. These vary greatly in complexity and capability. Speaker dependent isolated word recognition rates approaching 100% have been reached by some sophisticated systems. These are usually implemented on mainframe or large mini or micro computers, and require specialized hardware and complex software in order to realize real-time recognition.
In many areas, extremely high recognition rates are not necessary. Such is true in some consumer products, especially in games and toys. In these systems, cost minimization is often more important than a small, incremental improvement in recognition rates. Low cost requires systems which use a minimum number of electronic components, which generally limits both available memory and processing power.
Also, in many low cost applications, speaker independent recognition is not required. Single word recognition may be sufficient. Ability to operate in a noisy environment is often needed, as is the ability to recognize single words embedded in a longer utterance.
Present low cost recognition techniques suitable for typical consumer applications usually utilize zero crossing rate techniques and compression/stretch time registration. These techniques generally do not perform adequately for even small vocabularies under good conditions. Present low cost techniques usually do not enroll the references properly, further interfering with their ability to compare received speech with the reference templates defining the vocabulary.