Although speech recognition has been around for decades, the quality of speech recognition software and hardware has only recently reached a high enough level to appeal to a large number of consumers. One area in which speech recognition has become very popular in recent years is the smartphone and tablet computer industry. Using a speech recognition-enabled device, a consumer can perform such tasks as making phone calls, writing emails, and navigating with GPS using only voice commands.
Speech recognition in such devices is far from perfect, however. When using a speech recognition-enabled device for the first time, the user may need to “train” the speech recognition software to recognize his or her voice. Even after training, however, the speech recognition functions may not work well in all sound environments. For example, the presence of background noise can decrease speech recognition accuracy.
In an always-on audio (AOA) system, a speech recognition-enabled device continuously listens for the occurrence of a trigger phrase. The trigger phrase, when detected, alerts the device that the user is about to issue a voice command or a sequence of voice commands, which are then processed by a speech recognition engine in the device. The system, by continuously listening for the occurrence of the trigger phrase, frees the user from having to manually indicate to the device that the voice command mode is being entered, thus eliminating the need for an action such as pressing a physical button or a virtual button or control on the touch screen of the device. While simplifying the user experience, the AOA system presents a challenge of accurately detecting the occurrence of a trigger phrase. If the presence of the trigger phrase is detected when no trigger phrase has been uttered, an error called False Accept (FA) occurs. It is important to reduce the number of FA errors to a very low level, so as to not degrade the user's experience of interacting with the AOA system.