Human beings are capable of recognizing the source location, i.e. distance and orientation, of sounds heard through the ears through a variety of auditory cues related to head and ear geometry, as well as the way sounds are processed in the brain. Surround sound systems attempt to enrich the audio experience for listeners by outputting sounds from various locations which surround the listener.
Typical surround sound systems utilize an audio signal having multiple discrete channels that are routed to a plurality of speakers, which may be arranged in a variety of known formats. For example, 5.1 surround sound utilizes five full range channels and one low frequency effects (LFE) channel (indicated by the numerals before and after the decimal point, respectively). For 5.1 surround sound, the five full range channels would then typically be arranged in a room with three of the full range channels arranged in front of the listener (in left, center, and right positions) and with the remaining two full range channels arranged behind the listener (in left and right positions). The LFE channel is typically output to one or more subwoofers (or sometimes routed to one or more of the other loudspeakers capable of handling the low frequency signal instead of dedicated subwoofers). A variety of other surround sound formats exists, such as 6.1, 7.1, 10.2, and the like, all of which generally rely on the output of multiple discrete audio channels to a plurality of speakers arranged in a spread out configuration. The multiple discrete audio channels may be coded into the source signal with one-to-one mapping to output channels (e.g. speakers), or the channels may be extract from a source signal having fewer channels, such as a stereo signal with two discrete channels, using other techniques like matrix decoding to extract the channels of the signal to be play.
Surround sound systems have become popular over the years in movie theaters, home theaters, and other system setups, as many movies, television shows, video games, music, and other forms of entertainment take advantage of the sound field created by a surround sound system to provide an enhanced audio experience. However, there are several drawbacks with traditional surround sound systems, particularly in a home theater application. For example, creating an ideal surround sound field is typically dependent on optimizing the physical setup of the speakers of the surround sound system, but physical constraints and other limitations may prevent optimal setup of the speakers; furthermore, there is generally no standard for speaker height in many surround sound formats. Moreover, loud playback of audio through a surround sound system, such as to recreate a movie theatre environment, can be too disturbing to neighbors to be a viable option in many environments.
Headphones provide an attractive to solution to many of the above problems and provide a highly portable and easy to use audio entertainment solution. Headphones generally work using a two speaker stereo output, with a left speaker and a right speaker arranged close to the user's head either on or in the user's ears. However, as a result of such a configuration, ordinary stereo headphones tend to produce an audio signal that sounds like it is originating from inside or from very close to the listener's head. For example, because each ear only receives the audio output to its corresponding left or right channel, there is no transaural acoustic crosstalk in the audio heard by the listener (i.e., where the sound signal output by each speaker is heard at both ears), and the lack of crosstalk reinforces the perception that the origin of the sound is located at the user's head.
It has been proposed that the source location of a sound can be simulated by manipulating the underlying source signal to sound as if it originated from a desired location, a technique often referred to in audio signal processing as “sound localization.” Attempts have been made to use sound localization to create virtual surround sound systems in headphones to modify audio signals played in the headphones to sound as if they originate from distant locations, as in a surround sound system, rather than at the location of the ears where the headphone speakers are located.
Many known audio signal processing techniques attempt to recreate these sound fields which simulate spatial characteristics of a source audio signal using what is known as a Head Related Impulse Response (HRIR) function or Head Related Transfer Function (HRTF). A HRTF is generally a Fourier transform of its corresponding time domain HRIR and characterizes how sound from a particular location that is received by a listener is modified by the anatomy of the human head before it enters the ear canal. Sound localization typically involves convolving the source signal with a HRTF for each ear for the desired source location. The HRTF is often derived from a binaural recording of a simulated impulse in an anechoic chamber at a desired location relative to an actual or dummy human head, using microphones placed inside of each ear canal of the head, to obtain a recording of how an impulse originating from that location is affected by the head anatomy before it reaches the transducing components of the ear canal.
For virtual surround sound systems involving headphone playback, the acoustic effect of the environment also needs to be taken into account in order to create a surround sound signal that sounds as if it were naturally being played in the acoustic environment of the listener or acoustic environment of a typical surround sound system, such as a living room, as opposed to being played directly at the ears or in an anechoic chamber with no environmental reflections and reverberations of the sounds. Accordingly, many known audio signal processing techniques for virtual surround sound systems or sound localization in headphone audio also model the impulse response of the environment, hereinafter referred to as the “room impulse response” (RIR), using synthesized room impulse response function that is algorithmically generated to model the desired environment, such as a typically living for a home theater system. These room impulse response functions for the desired locations are also convolved with the source signal in order to simulate the acoustic environment, e.g. the acoustic effects of a room.
Unfortunately, existing virtual surround sound systems using the aforementioned techniques to modify acoustic signals output to headphones still suffer from poor performance, and do not produce natural sounds achieved in an actual surround sound speaker setup or sounds naturally localized to distant locations. For example, while some existing systems do an adequate job at simulating directional information, most do a poor job of sound externalization, causing the audio to still sound like it is originating at the listener's head when it is played back through headphones.
It is within this context that aspects of the present disclosure arise.