The human ears can locate sounds in three dimensions in range (distance), in direction above and below (elevation), in front and in rear (azimuth), as well as to either (right or left) side. The properties of sound received by an ear from some point of space can be characterized by head-related transfer functions (HRTFs). Therefore, a pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to come from a target position, i.e. a virtual target position.
Many applications of three dimensional (3D) audio using headphones, such as virtual reality, spatial teleconferencing, virtual surround, require high quality HRTF datasets, which contain transfer functions for all necessary directions. Some forms of HRTF-processing have also been included in computer software to simulate surround sound playback from loudspeakers. However, measuring HRTFs for all azimuth angles is a tedious task, which involves hardware and materials. Moreover, the memory required to store the database of measured HRTFs can be very large. Additionally, using personalized HRTFs can further improve the sound experience, but acquiring them complicates the process of the synthesis of 3D sound.
The idea of a fully parametric model for deriving HRTFs to synthesize binaural sound has been proposed in R. O. Duda. “Modeling head related transfer functions”, 27th Asilomar Conference on Signals. Systems and Computers, 1993 and V. R. Algazi et al. “The use of head-and-torso models for improved spatial sound synthesis”, Audio Engineering Society (AES) 113th Convention, October 2002. However, for realistic binaural sound rendering the obtained HRTFs are not accurate enough, since these models strongly deviate from the personalized HRTFs.
A lot of research has been conducted to develop a method to obtain HRTFs that would not strongly deviate from personalized (user specific) HRTFs. 3D HRTFs interpolation can be used to obtain estimated HRTFs at the desired source position from measured HRTFs, as demonstrated in H. Gamper, “Head-related transfer function interpolation in azimuth, elevation and distance”, Journal of the Acoustical Society of America (JASA) Express Letters, 2013. This technique requires HRTFs measured at nearby positions, e.g. four measurements forming a tetrahedral enclosing the desired position. Additionally, it is hard to achieve a correct elevation perception with this technique.
Thus, there is a need for an improved audio signal processing apparatus and method allowing for generating a binaural audio signal from a virtual target position.