The present invention relates to speech recognition and techniques for configuring and controlling devices incorporating speech recognition. In particular, the embodiments of the present invention relate to methods and apparatuses for controlling the operation of a device by voice commands.
Speech recognition systems are electronic systems implemented in hardware, software or a combination of hardware and software that allow a machine to recognize speech inputs. Speech recognizers can be used to control the behavior of an electronic system in accordance with the particular speech inputs received. For example, a speech recognition system may recognize a certain number of utterances (i.e., words or phrases). The set of utterances that a recognizer can understand is often referred to as the “recognition set.” When a user speaks to a recognizer, the recognizer may produce different results (typically electronic signals or software states) corresponding to whether or not the input speech was an utterance in the recognition set, and additionally, but not necessarily, which of the utterances in the recognition set was received.
Typically, when a speech recognition system is powered on, the speech recognizer is always on and always listening for utterances in the recognition set. However, a speech recognizer that is always on and always listening for commands has two problems:
1. In battery operated products, the current drained by analyzing each sound can quickly wear down batteries.
2. In all products there is an issue of the recognizer incorrectly interpreting unintended sounds as commands (false accepts). This issue is exacerbated in products that are always on and always listening.
To address the first issue, battery operated speech recognition products typically require a button press or other switch to turn on the recognizer. These devices typically power down after some time if no command is recognized, thereby saving battery life. This approach, however, is self-defeating, because it requires the use of ones eyes, hands, and feet to locate the speech recognition device and turn it on. Examples of the use of such speech recognition in consumer electronic products include U.S. Pat. Nos. 6,188,986 and 6,324,514 for electrical switches, U.S. Pat. Nos. 6,101,338 and 5,980,124 for cameras, U.S. Pat. No. 4,771,390 for cars, and U.S. Pat. Nos. 6,526,381 and 5,199,080 for remote controls.
Improvements in speech recognition technology have decreased the false accept rate in continuously listening products. To further decrease this false accept rate, developers utilize “dual triggered” or “gated” approaches, in which the recognizer first listens for a trigger word, the occurrence of which activates a second recognition set whose output controls the device of interest. By this two step process, false accepts are less likely because wrong utterances must pass through two hurdles instead of one to activate the device. However, this introduces the problem of increasing the false reject rate, because the “right” words also must pass the double hurdle. Furthermore, this approach makes usage more cumbersome because a series of words must be recalled to activate the device.
To alleviate these problems, speech recognition has been used in combination with auxiliary sensing devices to improve recognition accuracy while decreasing false trigger rates. For example, U.S. Pat. Nos. 6,532,447 and 5,255,341 describe an auxiliary sensing device that is a proximity detector that turns on a speech recognizer in a cell phone and an elevator, respectively, when a potential user is nearby.
A proximity detector can also assist in saving battery life by keeping the device in a low power mode, but will not necessarily help the false triggers and recognition accuracy when people are in its vicinity. One such example is a speech recognizer that provides voice control of lights in a room even when there are people in the room; the recognizer would automatically go on, and conversations could created false triggers. Other types of sensors could be more effective in preventing false triggers. For example, a voice activated lamp or nightlight could be enabled only when needed during darkness to prevent false triggers when it is not needed (during daylight). Such a situation is more complex because one auxiliary sensing device for controlling the speech recognizer, such as the light sensor, is not sufficient to control its full operation. This occurs when the light sensor that activated the speech recognizer during the darkness gets deactivated by the light of the lamp. Once the room is illuminated by the lamp, the light detector would deactivate the recognizer, so the lights would have to be turned off manually and the benefit of turning the light off with a voice command would be lost.
The current state of the art for controlling the operation of a speech recognizer with an auxiliary sensing device (e.g. proximity sensor) is described by the block diagram of FIG. 1, in which power is provided to speech recognizer 3 from power supply 5 through switch 7 whose operation is controlled by auxiliary sensing device 9. When switch 7 is closed by auxiliary sensing device 9, speech recognizer 3 is powered to receive and analyze audio signals coming from microphone 1. The output of speech recognizer 3 controls the operation of device under control 11 when appropriate speech commands are spoken into the microphone. For example, auxiliary sensing device 9 may be the proximity sensor of U.S. Pat. No. 5,255,341, which causes speech recognizer 3 to be powered on when a potential user is in the proximity of an elevator, which is device 11 of FIG. 1. Thus, when a person is near the elevator and only when a person is near the elevator, the recognizer is activated to receive audio signals from microphone 1, which controls the operation of the elevator. The function of auxiliary sensing device 9 in this example is to minimize false commands to the elevator at times when no one is near but when false triggers from background noise might otherwise activate its operation.
The device described by FIG. 1 is not adequate to control the operation of a speech recognizer in all circumstances. For example, consider the case of a device under control 11 being a lamp that is controlled by commands to speech recognizer 3. Without a mechanism for controlling the power fed to the recognizer, it would consume unnecessary power and would false trigger the lamp off and on in response to extraneous noises or conversations when people are near. Thus, auxiliary sensing device 9 might be a light sensor that causes switch 7 to close only when the room is dark, because there is no need to command the lamp when the room is light. In this case, when the room is dark, speech recognizer 3 is powered from power supply 5 through switch 7 to control the lamp operation via verbal commands received by it from microphone 1. Thus, a person can turn on a lamp in the middle of the night without having to find it and push a button.
A problem arises when this same person wishes to turn off the lamp to go back to sleep. In this case auxiliary sensing device 9 may be activated to close switch 7 by the light coming from the lamp. So the only method for the person to turn off the lamp is to reach for it and push a button. This requirement greatly diminishes the utility of a lamp that is controlled by a speech recognizer.
Thus, there is a need for more sophisticated methods and apparatuses for controlling the operation of a device by voice commands.