1. Field of the Invention
This invention relates generally to three dimensional (3D) sound. More particularly, it relates to an improved regularizing model for head-related transfer functions (HRTFs) for use with 3D digital sound applications.
2. Background of Related Art
Some newly emerging consumer audio devices provide the option for three-dimensional (3D) sound, allowing a more realistic experience when listening to sound. In some applications, 3D sound allows a listener to perceive motion of an object from the sound played back on a 3D audio system.
Extensive research has established that human localize sound source location by using three major acoustic cues, the interaural time difference (ITD), interaural intensity difference (IID), and head-related transfer functions (HRTFs). Note that the time domain equivalent of HRTF is usually termed head-related impulse response (HRIR). Both HRTF and HRIR are interchangeably used in this invention wherever they fit the context. These cues, in turn, are used in generating 3D sound in 3D audio systems. Among these three cues, ITD and IID occur when sound, from a source in space, arrive at both ears of a listener. When the source is at a arbitrary location in space, the sound wave arrives at both ears with different time delays due the unequal path length of wave propagation. This creates the ITD. Also, due to the head shadowing effects, the intensity of the sound waves arriving at both ears can be unequal. This creates the IID.
When the sound source is in the median plane of the head, both ITD and IID become trivial. However, the listener still can localize sound in terms of its elevation, and some degree of lateralization. This effect, confirmed by recent research, is due to the filtering effects of head, torso, shoulders, and more importantly, the pinnae, collectively termed as external ear. In particular, external ear can be viewed as a set of acoustical resonators, the resonance frequency of each equivalent resonator varies with respect to the in-coming angle of the sound source. Verified by measured HRTFs, these resonance frequencies manifest themselves as peaks and valleys in the spectra of the measured HRTFs. Moreover, these peaks and valleys change their center frequency with respect to sound source position change.
In order to synthesize a positioned 3D audio source, a particular set of ITD, IID, and a pair of HRTF has to be used. In order to simulate the motion of the sound source, in addition to the varying ITD and IID, many HRTF pairs have to be used to obtain a continuous moving sound image. In the prior arts, hundreds or thousands of measured HRTFs are used to fulfill this purpose. There are problems with this approach. This first problem is that the HRTFs are obtained with sound source at discrete locations in the space, thus not providing continuum of the HRTF function. The second problem is that the measured HRTFs contain measurement error and thus are not smooth. Both problems cause annoying clicks in simulating sound source motion, when discontinued HRTFs are switched in and out of the filtering loop.
One conventional solution to the adaptation of a discretely measured HRTF within a continuous auditory space is to “interpolate” the measured HRTFs by linearly weighting the neighboring impulse responses. This can provide a small step size for incremental changes in the HRTF from location to location. However, interpolation is conceptually incorrect because it does not account for the fact that linear combination of adjacent impulse responses increases the number of overall peaks and valleys involved, and thus significantly compromises the quality of the interpolated HRTF. This method, called direct convolution, is shown in FIG. 3. In particular, 460 is the sound source to be 3D positioned. 410 and 412 are left channel and right channel delays, together to form ITD. 420 and 422 are left and right ear HRTFs. 430 and 432 are signals either can be sent to left and right ear for listening or can be sent to next stage for further processing.
Other attempted solutions include using one HRTF for a large area of the three-dimensional space to reduce the frequency of discontinuities which may cause a clicking sound. However, again, such solutions compromise the overall quality of the 3D sound rendering.
There is thus a need for a more accurate HRTF model which provides a suitable HRTF for source locations in a continuous auditory space, without annoying discontinuities.