In a smart home or a speech interaction system, the speech wake-up technology is widely used. However, an effect and a computation amount of the speech wake-up greatly reduce practical application experience and improve requirements for device hardware. For example, if an error wake-up rate of the speech wake-up technology is greater than a threshold in practice, such as one error wake-up per three hours, such frequency may cause user's disgust. On another hand, if the computation amount exceeds a computation power of some low-end chips, a usage of the speech wake-up technology for many products may be restricted.
In the related art, the speech wake-up technology is to use a keyword-spotting method. By designing a small Deep Neural Network (hereinafter denoted as DNN for short) model and by constructing a clever little decoding network, with a few tricks of keywords retrieve, speech wake-up functions are achieved.
However, the above speech wake-up technology with the keyword-spotting method has a large amount of model parameters, a design of fillers needs to be changed for a different wake-up word (also called as wakeword), and corresponding decoding parameters and tricks retrieved need to be adjusted. It is hard to have a unified algorithm to ensure that an effect of each wake-up word remains at a stable level. After the filler has been set, a misunderstanding level of the wake-up word is fixed and the wake-up word that is misunderstood cannot be adjusted and learnt flexibly and easily.