Conventionally, the binaural (or hearing with two ears) 3D audio reproduction system uses a pair of headphones to reproduce the binaurally recorded or synthesized sound so that a listener can perceive sound images coming from certain locations, such as front, rear, up, above, near, and far in 3D space surrounding the listener. However, there are limitations in the conventional headphone system, which prevents the listener from accurately perceiving 3D audio.
Firstly, Møller [1] reasoned that the headphone coupling characteristics were not the same as the characteristics of free field sound sources.
Secondly, there are shape and size variations in human heads and ears—no two people have the same ear shape. Therefore, a binaural sound captured with a dummy head or synthesized using a generic set of Head-Related Transfer Functions (HRTFs), a set of sound source measurements in a 3D space surrounding the listener, will be perceived differently by different people. To overcome this issue, either individualized recording or individualized HRTFs for binaural synthesis are required, which are both tedious to perform.
Thirdly, it is well known that headphone listening causes sound to be perceived as coming from inside the head (far and near sound are perceived to be the same)—there is a tendency for sound image to be perceived from the rear for frontal sound cues, thus causing front/back confusion.
There are a number of improved 3D-audio enhanced headphones [2-6] that are designed with multiple sound emitters and off-positioned sound emitters in existing surround headphones. However, although such headphones have different sound emitters positioned at different locations in the ear, all sound emitters are positioned directing sound in parallel directions towards the opening of the ear entrance, as illustrated in FIG. 2. This limits the enhancement of the positional perception.