In the modern era, a strong emphasis exists on providing automated technology to reduce human labor costs, improve productivity, and improve accessibility to a variety of individuals including those with physical and/or mental disabilities and limitations. One of the technological fields that may help to achieve the above benefits is in machines that can listen to and respond to human voice commands. Currently, voice-activatable machines are capable of performing a multitude of tasks. However, in some situations such as a noisy environment, these machines have difficulty detecting the location of the voice or source of sound in order to properly process the commands being given.
Determining the location of a source of sound is generally a fairly simple process for a human with normal hearing and sound processing capabilities, even amidst an environment filled with ambient noise. That is, in an environment in which a mixture of similar and distinct sounds are being created by multiple sources, the average human has the ability to locate the source of a target sound by mentally filtering out distinct and unimportant noises using auditory and visual cues, and then orienting his or her body to the direction from which the sound is emanating.
In contrast, in a noise-filled environment, a machine with a single microphone has difficulty detecting the location of a target sound source (e.g., a human voice giving commands) for many reasons. For example, a machine using a single microphone cannot tell the incident angle and distance of a sound source, unlike the binaural hearing mechanism of human beings. In addition, a stationary machine, for example, even with a fixed directional microphone cannot reorient itself for better sound pickup. Further, in an environment, such as a busy subway station, a train station, an airport, a casino, an event stadium, a metropolitan street, etc. even if a soundwave emanates directly at the machine intentionally, there is a strong likelihood of the machine receiving multiple soundwaves that are unintentionally directly-oriented. For example, in a subway station, an individual may be standing near the machine and giving commands, while simultaneously, passersby or bystanders may be present and talking while facing the machine also. In addition, there may be other ambient noise being reflected or directed to the machine, such as the mechanical sounds of arriving subway cars, music being played live or over station speakers, informational announcements, sounds of people moving on the floor, etc. All of these combined sounds in an environment may interfere and obfuscate the speech of the individual giving commands intended for the machine. As such, the machine may have difficulty in deciding on which sound to focus, and may subsequently terminate the listening procedure. In summary, a machine with a fixed microphone lacks the human binaural hearing capability, mental filtering mechanism, and re-orientation mobility to locate a speech source.
Thus, improved machine sound source location capabilities is desired.