Human beings are capable of recognizing the source location, i.e. distance and orientation, of sounds heard through the ears through a variety of auditory cues related to head and ear geometry, as well as the way sounds are processed in the brain. Surround sound systems attempt to enrich the audio experience for listeners by outputting sounds from various locations which surround the listener.
Typical surround sound systems utilize an audio signal having multiple discrete channels that are routed to a plurality of speakers, which may be arranged in a variety of known formats. For example, 5.1 surround sound utilizes five full range channels and one low frequency effects (LFE) channel (indicated by the numerals before and after the decimal point, respectively). For 5.1 surround sound, the five full range channels would then typically be arranged in a room with three of the full range channels arranged in front of the listener (in left, center, and right positions) and with the remaining two full range channels arranged behind the listener (in left and right positions). The LFE channel is typically output to one or more subwoofers (or sometimes routed to one or more of the other loudspeakers capable of handling the low frequency signal instead of dedicated subwoofers). A variety of other surround sound formats exist, such as 6.1, 7.1, 10.2, all of which generally rely on the output of multiple discrete audio channels to a plurality of speakers arranged in a spread out configuration. The multiple discrete audio channels may be coded into the source signal with one-to-one mapping to output channels (e.g. speakers). Alternatively, the channels may be extracted from a source signal having fewer channels, such as a stereo signal with two discrete channels, using other techniques like matrix decoding to extract the channels of the signal to be played.
Surround sound systems have become popular over the years in movie theaters, home theaters, and other system setups, as many movies, television shows, video games, music, and other forms of entertainment take advantage of the sound field created by a surround sound system to provide an enhanced audio experience. However, there are several drawbacks with traditional surround sound systems, particularly in home theater applications. For example, creating an ideal surround sound field typically depends on optimizing the physical setup of the speakers. Unfortunately, physical constraints and other limitations may prevent optimal speaker setup. Furthermore, there is generally no standard for speaker height in many surround sound formats. Moreover, a surround sound system may not be able to simulate the three-dimensional nature of a sound field with the same degree of accuracy as a headphone based system.
Systems have been proposed that manipulate an underlying sound source signal so that it sounds as if it originated from a desired location when played over headphones. This technique is often referred to in audio signal processing as “sound localization.” Many known audio signal processing techniques attempt to implement sound localization using a time domain Head Related Impulse Response (HRIR) function or its Fourier transform, known as a Head Related Transfer Function (HRTF). An HRTF characterizes how sound from a particular location is modified by the anatomy of the human head before it enters a listener's ear canal. Sound localization typically involves convolving the source signal with a HRTF for each ear for the desired source location. The HRTF is often derived from a binaural recording of an acoustic impulse in an anechoic chamber. The impulse source is positioned at a desired location relative to an actual or dummy human head having microphones placed inside each ear canal, to record how the head affects an impulse originating from that location before reaching the transducing components of the ear canal.
The HRTF may be represented by a set of attenuation values for corresponding frequency bins. The HRTF for a given location may be determined by recording a known broadband sound signal at the location without the dummy head and then recording the same signal at the location with the dummy head in place. Both recorded signals may then be converted to frequency domain spectra (e.g., by fast Fourier Transform). Dividing each attenuation value for each frequency bin in the spectrum obtained with the head by the corresponding attenuation value in the spectrum obtained without the head yields the HRTF for that location.
Virtual surround sound systems involving headphone playback may also to take into account environmental acoustic effects in order to create a surround sound signal that sounds as if it were naturally occurring in the listener's acoustic environment as opposed to being played directly at the ears or in an anechoic chamber with no environmental reflections and reverberations of the sounds. Accordingly, many known audio signal processing techniques also model the impulse response of the environment, hereinafter referred to as the “room impulse response” (RIR), using synthesized room impulse response function that is algorithmically generated to model the desired environment. These room impulse response functions are also convolved with the source signal in order to simulate the acoustic environment. In a surround sound type system these room impulse responses may generate unwanted effects such as echoes and reverberations. Such unwanted effects may change the user's perception of the location of a sound source and decrease the fidelity of the sound within the room.
Unfortunately, existing sound systems using the aforementioned techniques to modify acoustic signals still suffer from poor performance, and do not accurately localize sound or counteract unwanted room effects.
It is within this context that aspects of the present disclosure arise.