1. Technical Field
The present disclosure relates to the field of processing audio signals. In particular, to a system and method for processing an audio signal captured from a microphone.
2. Related Art
Consumer speech recognition systems are commonly utilized to control mobile phones, automobile functions, game machines and personal computers. The first practical consumer speech recognition implementations are commonly initiated using “push to talk” where a user pushes a button to start the speech recognition system. Starting the speech recognition system includes capturing and processing the audio from a microphone. Audio is not captured from the microphone when the speech recognition system is off. Newer speech recognition systems may never be off or inactive as the audio may be captured and processed continuously. In many cases, the newer speech recognition systems listen for a small set of activation keywords in order to initiate the full functionality and recognize more than the small set of activation keywords. The small set of activation keywords operates in a similar fashion to the “push to talk” initiation in order to minimize the occurrences of false positive recognition results.
In order for speech recognition systems to achieve reasonable recognition rates the audio captured from the microphone may be processed to reduce noise and/or echo. For example, a speech recognition system operating on a mobile phone may utilize the mobile phone's built-in echo canceller/noise suppressor to process the audio captured by the microphone. In some configurations, the speech recognition system does not operate on the same device as the microphone. For example, a wireless headset may capture the audio, process the audio and then transmit the audio to a mobile phone that handles the speech recognition. In an alternative example, an automobile headunit may capture and process the audio and send the resulting audio to a mobile phone or a cloud based server for speech recognition. The audio captured from inside an automobile may be problematic to the speech recognition system because there may be many sources of audio to confuse the speech recognition system. An automobile may have many different audio sources including navigation prompts, music, chimes/gongs and text to speech output. Each of these audio sources may be captured in the microphone signal that is sent to the speech recognition system. There is a need for improved processing of audio captured in an automobile or other similar environments for use in voice recognition systems.