Electronic devices, such as mobile phones, cameras, music players, notepads, etc., are becoming increasingly popular. For example, mobile telephones, in addition to providing a means for communicating with others, provide a number of other features, such as text messaging, email, camera functions, the ability to execute applications, etc.
A popular feature of electronic devices, such as mobile telephones, is their ability to recognize speech and perform actions based on the recognized speech. Preferably, such speech recognition functionality is always enabled, constantly listening for voice commands. A problem with such approach, however, is that conventional methodologies for determining and interpreting voice commands are very power consuming. This has led to solutions where the speech recognition functionality is typically disabled, and when a voice control is desired speech recognition functionality is manually enabled by the user. For example, by starting an application (e.g., GoogleNow) or pushing a button (e.g., Sony Smartband Talk).
For more intelligent systems (NLU, Natural Language Understanding) which are computationally intensive and preferably performed as a cloud service (e.g., GoogleNow,), it is common to have a first gating with a “key word” that can be detected without enabling full functionality. Examples of this include “OK Google” in the Google Now service. However, even the key word recognition consumes significant power and thus is not always enabled.