Homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, gesture, and speech.
When using speech as an input, the device is commonly equipped with microphones to receive voice input and a speech recognition component that attempts to recognize the voice input. This voice input often competes with other audible sounds that might be received by the microphones, such as background voices, ambient noise, acoustic echoes, and double talk. Double talk refers to a situation where sound from the near end talker reaches the microphones simultaneously with sound from the far end talker that is played out through the device loudspeakers. That is, sound played out of the loudspeaker (e.g., sound corresponding to signals received from the far end talker) echoes and reaches the microphones, along with sound from the near end talker.
These devices are often used for multiple purposes. In addition to outputting vocal conversation from a far end talker, for example, the device loudspeakers may be used to output music, movie soundtracks, and the like. Often these devices have small form factors. One of the challenges with devices of a small form factor is the ability to generate and to output high quality sound. This challenge is exacerbated by the additional requirements that the device still clearly receive voice input from a near end talker even during high fidelity sound output. Accordingly, there is an ongoing need for improved architectures of voice enabled devices that have small form factors, output high quality audio, and yet remain responsive to voice input from the user.