Speech recognition systems, or automatic speech recognizers, have become increasingly important as more and more computer-based devices use speech recognition to receive commands from a user in order to perform some action as well as to convert speech into text for dictation applications or even hold conversations with a user where information is exchanged in one or both directions. Such systems may be speaker-independent such as home or smartphone speech recognition systems where the system recognizes words no matter the speaker, or speaker-dependent where the system is trained by having the user repeat words. Some systems also may be configured to understand a fixed set of single word commands or short phrases, such as for operating a mobile phone that understands the terms “call” or “answer”. Systems on smartphones, smart speakers, tablets, and other devices may have an extensive vocabulary such as for a virtual assistant that provides voice activated search engines and performs other audio-activated tasks.
Thus, automatic speech recognition (ASR) is desirable for wearables, smartphones, and other small devices. Many of these small devices with ASR systems have a limited memory, computational capacity, and battery capacity. The acoustic front-end feature extraction may have large computational loads and resulting power consumption mainly due to conventional, generic digital signal processors (DSPs) that perform fast Fourier transform (FFT) and other DSP tasks for feature extraction. Feature extraction becomes even more important for always-on ASR systems in which feature extraction, voice activation (VA), and simple keyword detection (KWD) are constantly performed before executing other subsequent ASR back-end complicated tasks, and therefore, the feature extraction, VA, and KWD are performed much more frequently than those ASR back-end complicated tasks thereby impacting energy consumption more directly. This can drain a significant amount of battery power and undesirably dominate the processing time that could be used for other ASR tasks or other non-ASR tasks on small devices.