This invention relates to locating acoustic triggers, and more particularly to selective transmission of audio data based on detected instances utterances of a trigger word or phrase.
One approach to providing a speech-based user interface for a speech-enabled system is to use a device that monitors an acoustic environment waiting for a user to speak a command that can be interpreted by the system. The system may determine when the speaker intends to interact via the interface by determining when the speaker utters a particular word or phrase designed to “wake” the system. Such a word or phrase may be referred to as a “wakeword” or a “trigger word.”
Speech recognition used to determine the words spoken and further to automatically understand the intent of the speaker may be computationally expensive, and may be beyond the computational capacity of devices that may be located in the acoustic environment being monitored. One approach to addressing the limited computational capacity of such devices is to perform some of the computation on a server coupled to the devices over a data network, for instance over the public Internet. In some such approaches, the devices send audio data to the server only after a speaker utters the wakeword, and the server performs much of the computation required to interpret the speaker's input. Furthermore, limiting sending audio data to a server to highly likely instances of a user uttering the wakeword increases privacy for the user by avoiding disclosure of unrelated audio from the user's environment.