Keyphrase detection (such as Wake-on-Voice), or hot word detection systems may be used to detect a word or phrase or the like, which may initiate an activity by a device. For example, the device may wake by transitioning from a low power or sleep mode to an active mode, and/or may wake a particular computer program such as a personal assistant (PA) application. In this case, the detection of a waking keyphrase may activate an automatic speech recognition application to understand a command incoming from a user. For example, a user may state “Alexa, what is the weather?” where the word “Alexa” is the waking keyphrase.
Current keyphrase detection systems may model context-dependent phones of keyphrases and may use Gaussian mixture models (GMMs) to model the acoustics of the variations. Such models, however, are often too complex for implementation in low resource (for example, compute resource, memory resource, and power resource) environments. For example, often keyphrase decoding is performed by digital signal processors (DSPs) that have relatively high power consumption that is increased even further when such dense keyphrase decoding computations are performed. Such decoding also is typically based on whole-word scores requiring the training of a new specific word or phrase model whenever a new keyphrase is added to the system. Also, speech/non-speech detection is usually a separate module performed on a DSP and that is difficult to integrate with the other keyphrase detection operations, thereby increasing computational complexity and power consumption as well.