Over the past twenty years, considerable progress has been made in the field of virtual acoustics and binaural audio. Researchers in the field have advanced the understanding of psychoacoustics by developing sound systems that can generate virtual sound sources--perceived sound sources that appear to the listener to originate in areas of space that are distinct from the actual physical location of the speakers.
It is well understood in the field of virtual acoustics that a listener's localization of a sound source is largely a function of the difference of the sound wave fronts at each of the ears of the listener. Interaural time difference (ITD) refers to the delay in time, and interaural intensity difference (IID) refers to the attenuation in intensity, between "sound" perceived at the left and right ear drums of the listener. The brain uses these differences in the timing and magnitude of sounds between the ears to localize and identify the position in space from which the sound originates.
At frequency differences between the left and right ear below about 1.5 kHz (i.e., frequencies where the wavelength is larger than the listener's head), a listener determines the position in space from which a sound originates based primarily on the difference in time at which the sound reaches (i.e., the ITD) the left and right ears of the listener. However, at frequency differences higher than about 1.5 kHz, the spatial cue provided by the ITD is generally not sufficient for a listener to determine the location solely based on the ITD difference.
Instead, at frequencies greater than approximately 500 Hz and less than 10 kHz, a listener may depend primarily on intensity differences in the sound received by the left and right ears of the listener (i.e., the IID). Variations in intensity levels between the left and right eardrums are interpreted by the human auditory system as changes in the spatial position of the perceived sound source relative to the listener. Thus, a virtual sound system can create a virtual or "3-D " sound affect by providing a listener with appropriate spatial cues (ITD, IID) for the desired location of the virtual sound image.
However, in order to provide realistic and accurate virtual sound image, the sound system must also take into account the shape of the listener's head and the pinnae (or outer ear drum) of each ear of the listener. The pinnae for each ear imposes unique frequencydependent amplitude and time differences on an incoming signal for a given source position. The term Head-Related Transfer Functions (HRTF) is used to describe the frequencydependent amplitude and time-delay differences in perceived sound originating from a particular sound source that results from the complex shaping of the pinnae at the left and right ear drums of the listener. Thus, an effective virtual sound system provides ITD and IID spatial cues that have been modified to compensate for the spectral alterations of the HRTF of the listener.
Several technical barriers exist to providing realistic virtual audio over conventional speakers. The sound heard at each ear of the listener is a mixture of signals from all of the speakers providing sound to the listener. This mixture of signals or "crosstalk" makes it very difficult to create a stable virtual sound image because of the enormous complexity involved in calculating how the different signals will mix at a listener's ear. For example, in a two-speaker system, sound signals from each of the two speakers will be heard by both ears and mix in an unpredictable manner to alter the spectral balance, ITD and IID differences in sound signals perceived by the listener.
A theoretical solution for this dilemma, known as crosstalk cancellation, was originally proposed over 20 years ago. Crosstalk cancellation presupposes that a sound system can add a binaural signal at each speaker that is the inverse (i.e., 180 degrees out of phase) of the crosstalk coming from a competing speaker, delayed by the difference in it takes the competing speakers sound to reach the opposite ear, to cancel the sound of the undesired speaker at a given ear. Thus, using crosstalk cancellation, a sound system can, in theory, assure that a listener's left ear hears the output of the left speaker and a listener's right ear hears the output of the right speaker.
While systems have been implemented using crosstalk cancellation, several limitations have been encountered in conventional systems. In particular, the virtual effect may be restricted to a relatively small area at a specific distance and angle from the speakers. Outside this "sweet spot," the quality of the virtual sound effect may be greatly diminished. As a result, the number of listeners that may experience the virtual image at a time is limited. In addition, the virtual effect may be restricted to a narrow range of head positions within the "sweet spot," so a listener may lose the virtual sound effect entirely by turning his head. Such systems require the listener to remain in a fixed position relative to the speakers and, consequently, are impractical for many commercial applications.
Such limitations make conventional crosstalk cancellation difficult to implement in practice. Effective crosstalk cancellation typically requires precise knowledge of the location of the speakers, location of each listener and the head position of each listener. Deviations by the listeners from the expected physical location and head position relative to the speakers may result in a large and sudden attenuation of the virtual effect.
Some systems have attempted to compensate for the above limitations by limiting crosstalk cancellation to a particular band of frequencies. For example, crosstalk cancellation may be limited to signals having frequencies between approximately 600 Hz to 10 kHz, an approximation of the frequency range over which the human auditory system can localize a sound source based primarily on the IID. This limitation of frequencies at which crosstalk is canceled increases the range of head movement that can occur within the predetermined sweet spot.
What is needed is an improved system and method for localizing sound in a virtual system. Preferably such a system and method would provide a larger sweet spot and be less sensitive to head movement of listeners in the sweet spot. In addition, such a system and method would preferably enhance the listeners' ability to perceive and differentiate the location of virtual sources.