1. Field of the Invention
The present invention relates generally to speech-recognition devices.
2. Background Art
The use of speech-recognition (or voice-recognition) technology is becoming a viable means to control one's environment. As the sophistication of the speech-recognition technology increases and the cost of the equipment decreases, the use of speech-activated devices will become commonplace. Applications for speech recognition technology are numerous. Obvious examples include the control of appliances, consumer electronics, toys, and tools. Speech recognition technology is most useful when the hands and/or eyes are busy or useless, e.g., while driving or in a dark room. Furthermore, speech recognition technology can be a big help for people with physical impairments.
Speech recognition technology has been under development for several decades. This development has resulted in a variety of hardware and software tools for personal computers. Speech-recognition systems used to require specialized circuit boards (i.e., those with digital signal processors (DSPs)) and software. With the development of more powerful and sophisticated computer hardware, the need for specialized circuit boards has disappeared. Currently, most speech-recognition software can take advantage of the generally available computer hardware.
Speech-recognition technology comes in two flavors: finite command recognition (trivial speech recognition) and true dictation recognition (nontrivial speech recognition). The trivial speech recognition simply matches the speech pattern of a spoken command with a stored set of known commands. This type of speech recognition is relatively straightforward and does not require costly and bulky equipment or software. In contrast, the nontrivial speech recognition can analyze the speech to recognize parts of speech, grammar, word meaning, and context. This type of speech recognition requires relatively expensive hardware and software. The hardware for nontrivial speech recognition tends to be bulky and cannot be incorporated into small devices.
The nontrivial speech-recognition technology can be further subdivided into two categories: discrete and continuous speech recognition. In discrete speech recognition, each spoken word must be separated by a brief pause (usually a few tenths of a second) so that the computer may distinguish the beginning and ending of words. In contrast, continuous speech recognition requires no pauses between the words and can process words spoken in a normal speech. The degree of sophistication of a continuous speech recognition system is often determined by the size of its vocabulary.
Speech recognition tools also can be classified into speaker-dependent and speaker-independent categories. The speaker dependent tools require a user to participate in extensive training exercises to drill the system to recognize the user's speech profile. The machine will then respond to the specific user. After such training, the accuracy of speech recognition is usually respectable. With a speaker independent system, on the other hand, no training of the system is required; any user can begin to use the machine, which will then attempt to adapt (“train”) itself to the speech profile of the user. With the speaker independent system, the initial accuracy rate for speech recognition is less desirable, but it increases with use.
The choice of which type of speech-recognition applications to use is often dictated by the resources required. The cost of nontrivial speech-recognition tools has come down significantly in recent years. However, it is typically still too expensive to be deployed in remote controls for appliances such as household appliances and consumer electronics (herein, “appliance” will be used as a general term to refer to all types of electrical appliances and consumer electronics used in households or vehicles). In addition, the hardware required for nontrivial applications tends to be too bulky to be incorporated into small consumer products. In contrast, remote controls using other technologies (e.g., push buttons plus infrared or radio frequency tramsmitters) are more affordable. Consequently, they are widely used for controlling consumer electronics and appliances.
U.S. Pat. No. 6,119,088, issued to Ciluffo, discloses a voice-activated remote control that uses the trivial command recognition technology and allows for only dozens of preprogrammed voice commands. U.S. Pat. No. 6,188,986 B1 issued to Matulich et al., discloses a voice-activated device that controls a household electrical switch or an AC circuit. The Matulich device also uses the trivial speech-recognition technology. Thus, there exists a need to have remote controls that can respond to more sophisticated voice commands such as “VCR, tape the program from 8 to 9 PM and from 10 to 11 PM tonight.” This type of sophisticated command will require nontrivial, continuous speech-recognition technology.