A growing number of systems are configured for voice-based control, or voice command, of one or more aspects of such systems. These systems may be referred to voice-controllable systems. A voice-controllable system may allow a user to easily control aspects of the system operation in a hands-free manner. Some example voice-controllable systems include home appliances, mobile phones (e.g., for voice-based dialing, texting, web browsing, etc.), media systems (e.g., TV, stereos, etc.), computer operating systems, commercial software for computers, internet search engines, vehicles, and call centers. Voice control has improved in recent years due to substantial advancements in voice recognition, e.g., based on the advancement of deep learning generated algorithms and the development of graphics processing units (GPUs) that allow accelerated processing of voice recognition algorithms.
However, for voice-controllable systems that also generate sound, such as certain TVs and other entertainment systems, mobile phone, computers, blue tooth speakers, etc., referred to herein as voice-controllable sound generating systems (SGS), the effectiveness of the voice-recognition system may be lessened by the fact that audio output by the voice-controllable SGS mixes with voice command audio, and may thus mask or make the voice audio difficult to identify.