Many electronic voice activated devices use Automatic Speech Recognition (ASR) technology as a means for entering voice commands and control phrases. For example, users may operate the Internet browser on their mobile device by speaking audio commands. The voice activation includes a Voice Trigger (VT) detection process and an ASR process.
Voice trigger is used to detect that a specific audio command may have been uttered and then the ASR process the speech.
In order to respond to the user command or request, the ASR engine needs to be always active which is an issue as power consumption is a critical parameter in mobile devices.
Once the VT detects that the user has spoken a predefined phrase the ASR engine will be activated to analyze the user's commands and then, if recognized, follow the predefined action.
A major issue with the ASR engine is that it is continuously active, waiting for a user's vocal command. The ASR process may be executed by the ASR engine only upon a positive output from the VT process but because it is always active, the Voice Trigger (VT) process can consume significant processing power and thus negatively impact battery life and power consumption. A second issue with the ASR is that it can generate significant false detections by detecting a second talker or ambient noise.
There are many techniques for VT and most of them simply detect the presence of audio. This audio is not necessarily the user's speech and can be ambient noise or a second talker that is nearby, for examples. These other audio signals can generate false detection causing excessive power consumption and poor user experience.
It is common practice for the VT process to use the same input voice channel as used by the ASR process, for example the voice microphone and amplification circuitry. Having the voice channel active and running simply to serve the VT process requires significant power consumption, almost the same level that is required for running the speech recognition. This is especially true for very low power systems where the digital signal processor (DSP) processing power has been optimized. Power consumption of mobile devices is a very important part of the design and this present invention seeks to provide superior power saving while also maintaining and improving the performance. For example, when used in a noisy environment, the use of the microphone circuitry results in many spurious triggers of the speech recognition process and results in excessive use of power consumption.