1. Field of the Invention
The present invention relates to systems that automatically determine the location of one or more desired audio sources based on audio input received via an array of microphones.
2. Background
As used herein, the term audio source localization refers to a technique for automatically determining the location of at least one desired audio source, such as a talker, in a room or other area. FIG. 1 is a block diagram of an example system 100 that performs audio source localization. System 100 may represent, for example and without limitation, a speakerphone, a teleconferencing system, a video gaming system, or other system capable of both capturing and playing back audio signals.
As shown in FIG. 1, system 100 includes an output audio processing module 102 that processes at least one audio signal for playback via loudspeakers 104. The audio signal processed by audio output processing module 102 may be received from a remote audio source such as a far-end talker in a speakerphone or teleconferencing scenario. Additionally or alternatively, the audio signal processed by output audio processing module 102 may be generated by system 100 itself or some other source connected locally thereto. For example, in a video gaming scenario, the audio signal processed by output audio processing module 102 may represent music and/or sound effects associated with a video game being executed by system 100.
As further shown in FIG. 1, system 100 further includes an array of microphones 106 that converts sound waves produced by local audio sources into audio signals. These audio signals are then processed by an audio source localization module 108. Depending upon the implementation, the audio signals generated by microphone array 106 may first be processed by other logic (e.g., acoustic echo cancellers (AECs)) prior to being received by audio source localization module 108.
Audio source localization module 108 periodically processes the audio signals generated by microphone array 106 to estimate a current location of a desired audio source 114. Desired audio source 114 may represent, for example, a near-end talker in a speakerphone or teleconferencing scenario or a video game player in a video gaming scenario. The estimated current location of desired audio source 114 as determined by audio source localization module 108 may be defined, for example, in terms of an estimated current direction of arrival of sound waves emanating from desired audio source 114.
System 100 also includes a steerable beamformer 110 that is configured to process the audio signals generated by microphone array 106 to produce a single audio signal. In producing the audio signal, steerable beamformer 110 performs spatial filtering based on the estimated current location of desired audio source 114 such that signal components attributable to sound waves emanating from locations other than the estimated current location of desired audio source 114 are attenuated relative to signal components attributable to sound waves emanating from the estimated current location of desired audio source 114. This tends to have the beneficial effect of attenuating undesired audio sources relative to desired audio source 114, thereby improving the overall quality and intelligibility of the output audio signal. In a speakerphone or teleconferencing scenario, the audio signal produced by steerable beamformer 110 is transmitted to a far-end listener.
The information produced by audio source localization module 108 may also be useful for applications other than steering a beamformer used for acoustic transmission. For example, the information produced by audio source localization module 108 may be used in a video gaming system to integrate the estimated current location of a player within a room into the context of a game (e.g., by controlling the placement of an avatar that represents the player within a scene rendered by a video game based on the estimated current location of the player) or to perform proper sound localization in surround sound gaming applications. Various other beneficial applications of audio source localization also exist. These applications are generally represented in system 100 by the element labeled “other applications” and marked with reference numeral 112.
One problem for system 100 and certain other systems that perform audio source localization is the presence of acoustic echo 116. Acoustic echo 116 is generated when system 100 plays back audio signals via loudspeakers 104, an echo of which is picked up by microphone array 106. In a speakerphone or teleconferencing system, such echo may be attributable to speech signals representing the voices of one or more far end talkers that are played back by the system. Such echo is typically intermittent. In a video gaming system, the echo may be attributable to music, sound effects, and/or other audio content produced by a game. This type of echo is typically more continuous in nature.
The presence of acoustic echo can cause audio source localization module 108 to perform poorly, since the module may not be able to adequately distinguish between desired audio source 114 whose location is to be determined and the echo. This may cause audio source localization module 108 to incorrectly estimate the location of desired audio source 114.
There are some known techniques that may be used to deal with this issue. For example, acoustic echo cancellation may be performed on each of the microphone input signals using transversal filters. However, there are problems with this approach. For example, transversal filters require time to converge to an accurate acoustic impulse response and during this convergence time, echo cancellation performance may be poor. Furthermore, it is likely that the acoustic echo can never be canceled completely because of factors such as background noise/interference 118 and/or non-linearities associated with system loudspeakers or with other audio processing logic that is located outside of system 100. For example, where system 100 is a video gaming system that is part of a home theater installation, audio output produced by the system may be processed by audio processing logic located in a receiver and/or in external speakers.
These problems may render the acoustic echo cancellation insufficiently robust. As a result, residual echo may be delivered to audio source localization module 108, impairing its performance.
Another approach known in the art is to “freeze” the operation of audio source localization module 108 whenever audio content is being played back by system 100. This ensures that the estimated location of desired audio source 114 will not be changed based on acoustic echo. However, this approach negatively impacts the responsiveness of audio source localization module 108, since that module cannot track the location of desired audio source 114 during periods when audio content is being played back by system 100. Such lack of responsiveness is especially damaging in a video gaming application where the audio played back by the video gaming system may be virtually continuous.
What is needed, then, is a system for performing audio source localization in the presence of acoustic echo that addresses one or more of the aforementioned shortcomings associated with prior art solutions.