Products that use hands-free voice activated technology to implement a spoken user interface are used for a variety of functions. Examples include: switching lights on and off; dimming lights; playing music; finding songs in a play list; searching the internet; finding names in a phone list, and; dialing phones. There are some applications where the hands free technology is very important, such as having dimmable lights in an operating room, or having a phone that allows an operator to communicate while keeping both hands on the wheel while driving a car or controlling an airplane. Some products use power from an AC supply, while many are battery powered, most notably smart phones. In battery powered applications, keeping power usage low improves battery life. Automotive and vehicle based systems also require low power.
Speech recognition (SR) is used to decipher the user's spoken input into the various commands that are available for a device. SR can be implemented in software, hardware, or a combination of both. SR is power intensive in any case. To conserve power, voice activity detection (VAD) (sometimes “speech activity detection”) is used. In one approach, VAD keeps the SR function in a “sleep mode” when there are no voice commands. When voice activity is detected, the system performs a VAD trigger to wake up the SR process. In an SR application using VAD, power is conserved because the power intensive computations needed for SR are not performed on noise or during no signal conditions at the input. VAD is a useful technology in conserving power in SR systems, however the effectiveness of a VAD system is related to how well it can detect real voice activity at the input, how quickly it responds to voice activity at the input, and at what power levels.
Some existing VAD solutions include simple energy detection systems which consume lower power, but which are susceptible to reacting to noise, resulting in higher rates of false alarms. More computational intensive VAD systems based on the HMM (hidden Markov model) have improved voice detection and noise rejection, but use elaborate computation which is power intensive. Other VAD system use neural nets or classifiers which improve upon the false detection rates of energy detectors, but which have an extensive calibration process and use more power. A low power and low cost VAD system with low false detection rate is needed.