Modern devices such as smartphones, tablet computers, smart watches, smart speakers, smart televisions and the like are increasingly being provided with voice recognition capabilities, permitting a user to issue spoken commands to the device to command it to perform particular actions. For example, a user may issue a spoken command to cause the device to commence playback of a chosen audio track, to activate, deactivate or adjust the operation of an appliance such as a lamp or television that is connected to the device, or may issue a spoken command to cause the device to retrieve information such as news, traffic or weather information from the Internet and provide a spoken summary of such information.
Typically such devices are provided with an always-on audio detection system, which is operative continually to monitor an audio input received at an audio input transducer such as a microphone of the device. The always-on audio detection system is configured to distinguish between normal conversation that is not directed at the device and spoken commands that are directed at the device. Audio input recognized as normal conversation is ignored by the device, whereas audio input that is recognized as a spoken command that is directed at the device causes the device to perform an appropriate action in response to the command.
In order to facilitate the distinction between normal conversation and spoken commands that are directed at the device, such devices typically require an audio trigger, such as a specific spoken word or phrase uttered by the user or a sound such as a handclap or whistle, to be provided by the user prior to the user issuing a spoken command. For example, if the user were simply to ask the question “What's the weather forecast for tomorrow?”, the device would interpret this as normal conversation rather than a command directed at the device. However, if the user were to speak a trigger word before asking the question, e.g. uttering the phrase “Device! What's the weather forecast for tomorrow?”, the device would recognize the audio trigger “Device!” and would then respond to the question “What's the weather forecast for tomorrow?” by retrieving relevant weather information from the Internet and providing a spoken summary of that information.
The always-on audio detection system is typically provided as a part of a signal processor that is separate from a main processor or application processor of the device. This arrangement enables the power consumption of the device to be reduced when not in active use, as the main or application processor can be placed in an inactive or sleep state when the device is not in active use, whilst the signal processor incorporating the always-on audio detection system, which has a lower power consumption than the main or application processor, remains on, actively monitoring the input audio signal. When the always-on audio detection system detects the audio trigger it can send an interrupt to the main or application processor to switch the main or application processor into its active or awake state in order to process the trigger signal and respond to a subsequently detected user command.
One problem that can arise in such arrangements is that once the always-on audio detection system has detected the audio trigger, it must send the audio trigger and any subsequent command to the main or application processor, which transmits the audio trigger and command to a remote server via the Internet for verification of the audio trigger, i.e. confirmation that the audio trigger is a valid audio trigger. Only after this confirmation has been received by the device does the device acknowledge to the user that the command has been accepted, typically by outputting a specific audio tone. This causes a delay between the user issuing a command and receiving confirmation that the command has been accepted. The delay is variable but can be as much as 2-6 seconds. As will be appreciated, in the absence of a rapid confirmation that a command has been accepted, a user may re-issue the command, perhaps multiple times, until confirmation is provided that the command has been accepted. This can lead to multiple instances of the command being accepted and responded to, which may cause confusion and frustration to the user.