Recently, access to relatively sophisticated remote processing systems has become available through data networks such as the Internet. Such so called “cloud-based” processing services can provide the results of sophisticated and/or computationally complex processes to be provided to computerized devices which would otherwise not be able to implement such services.
An interesting example of such a capability is voice recognition, which by employing analytic models with high levels of computational complexity, can provide very good recognition rates for spoken commands and phrases. The SIRI voice assistant implemented by Apple and the ALEXA Voice Service provided by Amazon are two examples of voice recognition systems which employ cloud-based processing centers to achieve their voice recognition capabilities.
To use such a system, a user will say a predefined word or phrase, referred to herein as a “wake word”, followed by a spoken command in the presence of a voice command input device. In such systems, the voice command input device (an Amazon ECHO, etc, for ALEXA and an iPhone, etc. for SIRI) continually captures and monitors an audio stream picked up via one or more microphones on the device. The voice command input device listens for the predefined “wake word” to be spoken by a user within audio pick up range of the device, followed by a command. An example of valid command to such a system could be, “ALEXA, what is the time?”, where “ALEXA” is the wake word.
The captured audio stream is processed by the voice command input device to detect when/if the wake word has been spoken by a user. When such a positive determination is made, the voice command input device connects to the associated cloud-based processing service and streams the audio captured by the voice command input device to that processing service (i.e.—in the case of the Echo, to the Amazon Voice Service).
The processing service analyzes the received audio stream to verify the presence of the wake word and to determine the command, if any, spoken by the user. The processing service then determines the appropriate response and sends that response back to the device (i.e.—a voice message such as, “It is now 3:04 PM”) or to another system or device as appropriate. The range of possible responses is not limited and can include voice and/or music audio streams, data, commands recognized by other connected devices such as lighting controls, etc.
The use of a cloud-based processing service is preferred for such systems as the computational complexity to appropriately analyze the received audio to determine content and meaning is very high, and is presently best implemented in special purpose hardware such as GPU or FPGA based processing engines. Such hardware is too expensive, physically too large and/or has power requirements that exceed that available in many computerized devices, especially those powered by batteries, and thus cannot be included in many computerized devices such as smartphones, HVAC controllers, light switches, etc.
Therefore, the ability to provide voice command capabilities to control computerized devices, especially those such as computerized light switches or power outlets and other so called Internet of Things devices (“IoT”), is a very desirable thing to do as many such computerized devices cannot reasonably or economically be provided with hardware such as keyboards, touchscreens or the like to otherwise allow control of the devices.
However, the computational requirements for a computerized device to reliably interact with a cloud-based processing service are not easily met by many computerized devices, hence the current need for voice command input devices such as the Echo device and/or Google's Home device. In particular, the voice recognition models which are required to be executed by the voice command input device to capture and recognize the watch word require processors with high computational capabilities to be employed in the voice command input device. Such high powered processors generate significant amounts of waste heat when operating, which can be a problem in many other devices, such as IoT devices, or consumer worn or carried devices or HVAC controllers and remote sensors. Further, such high powered processors generally require a significant amount of electrical power, which can be a problem for battery powered, or parasitically powered devices (i.e.—computerized devices which obtain their operating power from the control signals of the devices they are controlling). Additionally, the time required to process the wake word, prior to relaying the voice command to the cloud service adds to the voice system's overall latency, decreasing user satisfaction.
Unfortunately, the cost of special purpose voice command input devices such as the Echo and/or Home can slow the adoption and use of such services. It is desired to have a system and method of providing a computerized device with voice command input capability in a reliable, economical and cost effective manner.