1. Field of the Invention
The present invention relates to improvements in automated control systems (e.g., home automation systems, building control systems, computer-controlled systems, etc.). More particularly, it relates to improvements in speech-recognizing control devices that respond to oral commands from a system user to produce a desired effect or result.
2. The Prior Art
Homes, offices and industrial buildings are often equipped with automated mechanisms for controlling, for example, lighting fixtures, motorized window treatments, security and access control systems, audio/visual equipment, thermostats, appliances, medical equipment, machines, etc. These systems are commonly controlled by one or more system controller components that, in response to instructions received from a user, produce suitable control signals to the output devices. Typically, the user instructs a system controller by using his or her hands to operate or manipulate electric control switches, pushbuttons, keypads or touch panels that, in turn, send appropriate electrical control signals to the system controller component either via a hard-wire connection therewith, or via a wireless communication systems operating in the infrared or radio-frequency (RF) band of the electromagnetic spectrum. See, for example, the RF-responsive home-automation system disclosed in U.S. Pat. No. 5,905,442 issued to Donald R. Mosebrook, et al. Alternatively, a user instruction can be initiated through the user interface of a computer or mobile device operating over the internet.
Addressing the concerns associated with locating and manipulating switches and buttons in a darkened room and the general inconvenience of interrupting tasks in order to activate or deactivate lighting fixtures, appliances and the like, U.S. Pat. No. 6,397,186 to W. S. Bush et al. discloses a hands-free remote control device that responds to the spoken commands of the user to control the operation of a plurality remote electrical fixtures and appliances. The disclosed device comprises a speech recognition system comprising a microcontroller that normally operates in a low power, “sound activation mode” in which it listens for a microphone's output. In the event the output of a microphone component exceeds a predetermined threshold, the microcontroller switches to a “speech-recognition mode” in which it is ready to receive speech commands. When the microcontroller recognizes a speech command, it produces a wireless (RF or IR) signal to control an automated, and suitably responsive, appliance, as commanded.
Recently, another hands-free automated home control system has been proposed that responds to speech commands from a system user to reduce the need for physical contact with the system's user interface. Such a system is described in a paper entitled “Developing a Voice Control System for ZigBee Home Automation Networks,” published in the Proceedings of IC-NIDC2010, by researchers Jieming Zhu et al. at Beijing University. Note, ZigBee is a registered trademark of ZigBee Alliance and refers to a low-cost, low-power wireless RF (radio frequency) mesh networking standard that is widely used in wireless building control and monitoring applications. Here, a voice-controlled home automation system combines one or more speech recognition modules with conventional ZigBee-based wireless sensor and actuator networks to enable a system user to wirelessly control the operation of automated mechanisms via speech commands. Two different modes of hands-free operation are discussed in this paper, i.e. a “speech password mode” and a “circle recognition mode”. In the speech password mode, a speech recognizer is constantly listening for one or more passwords, commonly referred to as “trigger phrases”. When a trigger phrase is detected, the system “wakes up” and prompts the user to say one or more commands from an expected command phrase vocabulary. In “circle recognition mode”, the speech recognizer is constantly listening for a complete set of allowable command phrases. When speech commands are successfully recognized by either method, the module produces RF control signals through an on-board ZigBee radio to various devices, sensors and power outlets that, in turn, control the operation of automated mechanisms to achieve a desired effect. The architecture described in this paper is desirable, since all of the speech recognition functions and user interaction is performed locally by embedded technology within the module. Hence, only control signals need to be transmitted between the speech recognition device and the targeted controller or automated mechanism, and therefore a high data rate network is not required which reduces system costs and complexity.
A significant problem emerges when attempting to use a speech-recognizing control device of the above type to provide totally hands-free speech recognition coverage throughout a room. Owing to ambient noise, room size, microphone quality, device location, room acoustics, furniture positions and a variety of other factors, a single control device may not be capable of reliably recognizing a speech command or trigger phrase from all locations throughout the given room. Indeed the above noted IEEE paper attempts to address this limitation by providing a third mode of operation, i.e., a “button mode”, that requires the user to walk up to the device and a push a button to trigger speech recognition at close range. However, requiring a user to touch a button on the control device reduces the utility of the speech recognition system and undermines the fundamental design goal of providing totally hands-free control in the space.
In the above-noted IEEE paper, the authors state: “When speech recognition modules are deployed reasonably, users can give voice orders at any position of the house.” The authors overlook, however, the fact that to provide complete hands-free speech recognition coverage and comfortable user interaction from anywhere in a building using speech recognition modules of this type, it would often be necessary to install many modules in reasonable proximity to one another, sometimes several per room in larger rooms. But, in doing so, it is inevitable that a given hands-free speech command would frequently be recognized simultaneously by two or more neighboring speech recognition modules. If multiple modules were to be triggered and audibly respond simultaneously, user confusion could occur. Worse, if two or more modules were to simultaneously process the same speech command, the target automated mechanism could be instructed by each triggered device to perform the same instruction with potentially undesirable or adverse effects. For example, “Lower Room Temperature Two Degrees” could be executed separately by several modules causing a thermostat to lower by multiples of the requested two degrees. One way to avoid this “duplicate response” problem would be to configure a unique speech trigger phrase for each module; but this would substantially reduce system usability, as the user would need to remember a large number of different trigger phrases—one for each speech recognition module.
In an attempt to address the above-noted “duplicate response” problem, one might also consider installing additional external microphones around a room that are operatively coupled to a single speech recognition module rather than locating several full speech recognition modules around a room. However, since speech recognition modules of this type preferably audibly interact with the user (e.g. “Are you sure you want to call the police?”), a user may not be able to comfortably hear an audio response from a relatively distant module after an external microphone successfully captures the user's speech command from a distant location in a room. Users may also have a natural tendency to turn toward the module that audibly responds to them (and potentially away from the external microphone that picked up their voice), which could negatively impact system reliability. Of course, adding operatively-coupled external audio speakers around a room could improve the audibility of audio responses. However, this overall approach would substantially increase the costs of such a system and reduce retrofit practicality since external speakers and microphones would need to be connected using physical wires or via a high data rate wireless connection capable of carrying high quality microphone and speaker audio signals. Finally, regardless of how many external microphones and speakers are installed in one room, should another interactive speech recognition module be installed in an adjacent room, the potential for the “duplicate response” problem still exists when a user's spoken voice can be heard clearly by speech recognition modules in both rooms at the same time.