Spatialized audio is sound that is processed to give the listener an impression of a sound source within a three-dimensional environment. A more realistic experience is observed when listening to spatialized sound than stereo because stereo only varies across one axis, usually the x (horizontal) axis.
In the past, binaural sound from headphones was the most common approach to spatialization. The use of headphones takes advantage of the lack of crosstalk and a fixed position between sound source (the speaker driver) and the ear. Gradually, these factors are endowed upon conventional loudspeakers through more sophisticated digital signal processing. The wave of multimedia computer content and equipment has increased the use of stereo speakers in conjunction with microcomputers. Additionally, complex audio signal processing equipment, and the current consumer excitement surrounding the computer market, increases the awareness and desire for quality audio content. Two speakers, one on either side of a personal computer, carry the particular advantage of having the listener sitting rather closely and in an equidistant position between the speakers. The listener is probably also sitting down, therefore moving infrequently. This typical multimedia configuration probably comes as close to binaural sound using headphones as can be expected from free field speakers, increasing the probability of success for future spatialization systems.
Spatial audio can be useful whenever a listener is presented with multiple auditory streams. Spatial audio requires information about the positions of all events that need to be audible, including those outside of the field of vision, or that would benefit from increased immersion in an environment. Possible applications of spatial audio processing techniques include: military communication systems to and between individuals within military vehicles, ships and aircraft as well as to and between dismounted soldiers; complex supervisory control system such as telecommunications and air traffic control systems; civil and military aircraft warning systems; teleconferencing and telepresence applications; virtual and augmented reality environments; computer-user interfaces and auditory displays, especially those intended for use by the visually impaired; personal information and guidance systems such as those used to provide exhibit information to visitors in a museum; and arts and entertainment, especially video games and music, to name but a few.
Environmental cues, such as early echoes and dense reverberation, are important for a realistic listening experience and are known to improve localization and externalization of audio sources. However, the cost of exact environmental modeling is extraordinarily high. Moreover, existing spatial audio systems are designed for use via headphones. This requirement may result in certain limitations on their use. For example, spatial audio may be limited to those applications for which a user is already wearing some sort of headgear, or for which the advantages of spatial sound outweigh the inconvenience of a headset.
U.S. Pat. No. 5,272,757, 5,459,790, 5,661,812, and 5,841,879, all to Scofield disclose head mounted surround sound systems. However, none of the Scofield systems appear to use head related transfer function (HRTF) filtering to produce spatialized audio signals. Furthermore, Scofield uses a system that converts signals from a multiple surround speaker system to a pair of signals for two speakers. This system appears to fail a real-time spatialization system where a person's head position varies in orientation and azimuth, thus requiring adjustment in filtering in order to maintain appropriate spatial locations.
One current method for generating spatialized audio is to use multiple speaker panning. This method only works for listeners positioned at a sweet spot within the speaker array. This method cannot be used for mobile applications. Another method, often used with headphones, requires complex individual filters or synthesized sound reflections. This method performs filtering of a monaural source with a pair of filters defined by a pair of head related transfer functions (HRTFs) for a particular location. Each of these methods has limitations and disadvantages. The latter method works best if individual filters are used, but the procedure to produce individual filters is complex. Further, if individual filters or synthesized sound reflections are not used, then front-back confusions and poor externalization of the sound source would result. Thus, there is a need to overcome the above-identified problems.