When experiencing a virtual environment graphically and audibly, a participant is often represented in the virtual environment by a virtual object. A virtual sound source produces sound that should vary realistically as movement between the virtual sound source and the virtual object occurs. The person participating in the virtual environment should ideally hear sound corresponding to the sound that would be heard by the virtual object representing the person in the virtual environment. In attempting to achieve this goal in the prior art, one or more signals associated with a simulated signal source are output through one or more stationary output devices.
Sound associated with a simulated sound source in a computer simulation is played through one or more stationary speakers. Because the speakers are stationary relative to the participant in the virtual environment, they typically do not accurately reflect a location of the simulated sound source, particularly when there is relative movement between the virtual sound source and the virtual object representing the participant. Accurate spatial location of the simulated sound source is a function of direction, distance, and velocity of the simulated sound source relative to a listener represented by the virtual object. Independent sound signals from sufficiently separated fixed speakers around the listener can provide some coarse spatial location, depending on a listener's location relative to each of the speakers. However, for more accurate spatial location, other audio cues must be employed to indicate position and motion of the simulated sound source. One such audio cue is the result of the difference in the times at which sounds from the speakers arrive at a listener's left and right ears, which provides an indication of the direction of the sound source relative to the listener. This characteristic is sometimes referred to as an inter-aural time difference (ITD). An example of an ITD system is disclosed by Massie et al. (U.S. Pat. No. 5,943,427). Another audio cue relates to the relative amplitudes of sound reaching the listener from different sources, which can be varied with a gain control (i.e., a volume control). The approach of a sound source toward the listener can be indicated by controlling the gain (or attenuation) to provide an increasing volume from a speaker. Angular direction to a source relative to the listener can also be indicated by producing a greater volume from one speaker than from another speaker, and changes in the angular direction can be indicated by changing these relative volumes. This amplitude variation is sometimes referred to as inter-aural intensity difference (IID).
Often, however, these simple binaural cues are inaccurate, because the precise location of the listener is not known. For example, the listener may be very close to a speaker that produces a low volume, such that the volume from each of a plurality of surrounding speakers is perceived as substantially equivalent by the listener. Similarly, the listener's head may be orientated such that the sounds produced by each speaker may reach both ears of the listener at about the same time. These binaural cues also become unreliable when attempting to estimate a sound's location in three-dimensional (3D) free space rather than in a two-dimensional (2D) plane, because the same IDT and/or IID results at an infinite number of points along curves of equal distance from the listener's head. For example, a series of points that are equal distance from the listener's head may form a circle. The IDT and/or IID at any point on this circle is the same. Thus, the listener cannot distinguish the true location of a simulated sound source that emanates from any one of the points on the circle. A series of these curves expand away from the listener, resulting in a conical shape. For this reason, the spatial location ambiguity is sometimes called “a cone of confusion.”
To compensate for these inadequacies, prior art systems have been developed that alternatively or additionally estimate acoustic filtering corresponding to the sound wave diffraction by the listener's head, torso, and outer ear (pinna). It is believed that the human ear may obtain spatial cues from this natural filtering of sound frequencies. Thus, these practitioners estimate and apply filtering functions to the simulated sound in an attempt to provide frequency-based spatial cues to the listener. Such functions are referred to as Head-Related Transfer Functions (HRTFs). Specifically, the HRTF is an individual listener's left or right ear far-field frequency response, as measured from a point in 3D space to a point in the ear canal of the listener. Thus, the HRTF is unique to each individual. Consequently, an HRTF is difficult to generalize for all listeners, and is complex to apply. Often, dedicated real time digital signal processing (DSP) hardware is needed to implement even simple spatialization algorithms. Also, implementing an HRTF requires storing, accessing, and processing a substantial amount of data. Such tasks often lead to a computational bottleneck for spatialization processing, which may be unacceptable in games and virtual environment programs, particularly, because it is difficult to implement HRTFs with low-cost computing devices. Moreover, HRTFs do not fully address certain of the spatialization problems. For example, HRTFs often cause sounds that originate in front of a listener to actually sound like they originate behind the listener. Also, for sounds near a median plane between the listener's two ears, HRTFs are known to cause a listener to perceive that the sound emanates from inside the head instead of outside the head of the listener.
In short, there is no universally acceptable approach to guarantee accurate spatial localization with fixed speakers, even using high-cost, complex calculations. Nevertheless, it would be preferable to devise an alternative that provides more accurate spatial localization than basic IDT and/or IID binaural techniques, yet is more computationally efficient and cost effective than HRTFs. To achieve greater accuracy than basic IDT and/or IID binaural techniques, it is further preferable to improve computational efficiency to enable computational solutions to the problem to be applied. Typically, binaural techniques use polar or spherical coordinates in trigonometric calculations required to control the sound produced by different speakers to better simulate what a listener would expect to hear at the location of virtual object in a virtual environment in response to sound produced by a virtual sound source. Trigonometric calculations usually require more computational resources than those involving Cartesian coordinates. Thus, it is preferable to use Cartesian coordinates in the proposed alternative.