There are many electronic devices that are capable of processing and outputting audio (i.e. audio devices). These devices include: smart-phones, tablets, audio players, and the like. These electronic devices can have transducers such as speakers and microphones. A microphone is usually configured to pick up an audio input for the device and the speaker is usually configured to reproduce an audio output by the device. An audio output may be representative of a song or other types of audio recordings while an audio input may be representative of ambient sounds and/or spoken utterances, such as words spoken by an operator of the electronic device, that occur in proximity of the microphone.
Conventional audio devices routinely employ computer-implemented techniques for identifying words spoken by the operator based on various features of a received audio input. These techniques, usually referred to as speech recognition techniques or automatic speech recognition (ASR), are combined with natural language processing techniques and allow the operator to control the audio device to perform tasks based on the operator's spoken commands.
In some instances, the operator may be located in a noisy environment when she/he submits spoken commands to the audio device for performing various tasks and, thus, the microphone may pick up, not only the spoken utterances of the operator, but also the ambient sounds of the noisy environment in which the operator is located. As such, the audio device may not be able to recognize the operator's spoken commands and, therefore, may not be able to perform the tasks that the operator desires it to perform.
Thus, there is a need for devices that are able to recognize operator spoken commands with more ease.