This invention relates generally to the field of three-dimensional audio reproduction over headphones or earphones. Specifically it relates to the personalized virtualization of audio sources, such as loudspeakers used in home entertainment systems, using headphones or earphones and developing a level of realism that is difficult to distinguish from the real loudspeaker experience.
The idea of using headphones to generate virtual loudspeakers is a general concept well understood by those in the art, as described in U.S. Pat. No. 3,920,904. In summary; a loudspeaker can be effectively virtualized over headphones or earphones for any individual primarily by acquiring a personalized room impulse response (PRIR) for the loudspeaker in question measured using microphones placed in the vicinity of that individual's left and right ear. The resulting impulse response contains information relating to the sound reproduction equipment, the loudspeaker, the room acoustics, (reverberation) and the directional properties of the subjects shoulders, head and ears, often referred to as the head related transfer function (HRTF) and typically covers a time span of hundreds of milliseconds. To generate a virtual acoustical image of loudspeaker, the audio signal that would ordinarily be played through the real loudspeaker is instead convolved with the measured left-ear and right-ear PRIR and fed to stereo headphones worn by the individual. If the individual is positioned exactly as they where during the personalization measurement then, assuming the headphones are appropriately equalized, that individual will perceive the sound to be coming from the real loudspeaker and not the headphones. The process of projecting virtual loudspeakers over headphones is herein referred to as virtualization.
The positions of the virtual loudspeakers projected by headphones match the head-to-loudspeaker relationships established during the personalized room impulse response (PRIR) measurements. For example, if a real loudspeaker measured during the personalization stage is in front of and to the left of the individuals head, then the corresponding virtual loudspeaker will also appear to come from the left front. This means that if the individual orientates their head such that, from their view point, the real and virtual loudspeakers coincide, the virtual sound will appear to emanate from the real loudspeaker and, provided the personalized measurements are accurate, that individual will have considerable difficulty distinguishing between virtual and real sound sources. The implication of this is that had a listener made PRIR measurements for each loudspeaker in their home entertainment system, they would be able to recreate the entire multi-channel loudspeaker listening experience simultaneously over headphones without actually having to turn on the loudspeakers.
However, the illusion of simple personalized virtual sound sources is difficult to maintain in the presence of head movements, particularity those on lateral plane. For example, when the individual has the virtual and real loudspeakers aligned, the virtual illusion is strong. However if that individual now turns their head to the left, since the virtual sound source is fixed relative to the individuals head, the perceived virtual sound source will also move with the head to the left. Naturally head movements do not cause real loudspeakers to move, and so to maintain a strong virtual illusion it may be necessary to manipulate the audio signals feeding the headphones such that the virtual loudspeakers also remain fixed.
Binaural processing also has applications for virtualizing loudspeakers using loudspeakers, rather than headphones, as described in U.S. Pat. Nos. 5,105,462 and 5,173,944. These also can make use of head tracking to improve the virtual illusion, as described in U.S. Pat. No. 6,243,476.
U.S. Pat. No. 3,962,543 is one of the earliest publications that describe the concept of manipulating the binaural signals fed to the headphones in response to a head tracking signal in order to stabilize the perceived position of the virtual loudspeaker. However their disclosure pre-dates recent advances in digital signal processing theory and their methods and apparatus are generally not applicable to digital signal processing (DSP) type implementations.
A more recent DSP-based head tracked virtualizer is disclosed by U.S. Pat. Nos. 5,687,239 and 5,717,767. This system is based on a split HRTF/room reverberation representation, typical of low complexity virtualizer systems, and uses a memory look-up to read out HRTF impulse files, in response to a look-up address derived from the head-tracking device. The room reverberation is not altered in response to head tracking. The main idea behind this system is that since the HRTF impulse data files are relatively small, typically between 64 and 256 data points, a large number of HRTF impulse responses, specific to each ear and each loudspeaker and for a wide range of head turn angles, can be stored within the normal memory storage capabilities of typical DSP platforms.
The room reverberation is not modified for two reasons. First, to have stored a unique reverberation impulse response for each head turn angle would have required enormous storage capacity—each individual reverberation impulse response being typically 10000 to 24000 data points in length. Second, the computational complexity of convolving room reverberation impulses of this size would be impractical, even with signal processors available today, and since the inventors do not discuss an efficient implementation for the convolution of long impulses, it is likely that they anticipated an artificial reverberation implementation in order to reduce the computational complexity associated with room convolutions. Such implementations, by definition, would not easily lend themselves to adaptation by the head tracker address. Since personalization is not discussed and was clearly not anticipated for this system, the inventors offer no information regarding what steps would be required to incorporate such a mode of operation either for the HRTF or reverberation processes. Moreover, since this system would require many hundreds of HRTF impulse files to be stored in order to allow for sufficiently smooth HRTF switching under control of the head tracker, it would not be obvious to one skilled in the art how all of these measurements could be made in a practical way such that members of the general public could be expected to undertake them in their own home. Neither is it obvious how a single room reverberation characteristic would be determined from all the personalized measurements. Further, since the room reverberation is not adapted by the head tracker address, it is clear that this system would never be able to replicate the sound of real loudspeakers in a real room and therefore its applicability to realistic virtualization is clearly limited.
Head tracking is well known as a technique for detecting head movement. Many approaches have been suggested and are well known in the art. Head trackers can either be head mounted, i.e., gyroscopic, magnetic, GPS-based, optical, or they can be off head, i.e., video, or proximity. The aim of a head tracker is to measure, on a continuous basis, the orientation of the individual's head while listening to the headphones and to transmit this information to the virtualizer to allow the virtualization process to be modified in real time as changes are detected. The head track data can be sent back to the virtualizer using wires, or it can be delivered wirelessly using optical, or RF transmission techniques.
Existing headphone virtualizer systems do not project a virtual acoustical image with a high enough degree of realism to stand up to a direct comparison against the real loudspeaker experience. This is because the current state of the art has made no attempt to directly incorporate a personalization method into a headphone virtualizer suitable for use by the general public due to the difficulties associated with the measurements and uncertainties about how to incorporate head tracking into such a scheme.