Systems for virtual reality are becoming increasingly relevant in a wide range of industrial applications. Such systems generally consist of audio and video devices, which aim at providing the user with a realistic perception of a three dimensional virtual environment. Advances in computer technology and low cost cameras open up new possibilities for three dimensional (3D) sound reproduction. A challenge to creation of such systems is how to update the audio signal processing scheme for a moving listener, so that the listener perceives only the intended virtual sound image.
Any sound reproduction system that attempts to give a listener a sense of space must somehow make the listener believe that sound is coming from a position where no real sound source exists. For example, when a listener sits in the “sweet spot” in front of a good two-channel stereo system, it is possible to fill out the gap between the two loudspeakers. If two identical signals are passed to both loudspeakers, the listener will ideally perceive the sound as coming from a position directly in front of him or her. If the input is increased to one of the speakers, the sound will be pulled sideways towards that speaker. This principle is called amplitude stereo, and it has been the most common technique used for mixing two-channel material ever since the two-channel stereo format was first introduced. However, it is intuitively obvious that amplitude stereo cannot create virtual images outside the angle spanned by the two loudspeakers. In fact, even in between the two loudspeakers, amplitude stereo works well only when the angle spanned by the loudspeakers is 60 degrees or less.
Virtual source imaging systems work on the principle that they get the sound right at the ears of the listener. A real sound source generates certain interaural time- and level differences that are used by the auditory system to localize the sound source. For example, a sound source to left of the listener will be louder, and arrive earlier, at the left ear than at the right. A virtual source imaging system is designed to reproduce these cues accurately. In practice, loudspeakers are used to reproduce a set of desired signals in the region around the listener's ears. The inputs to the loudspeakers must be determined from the characteristics of the desired signals, and the desired signals must be determined from the characteristics of the sound emitted by the virtual source.
Binaural technology is often used for the reproduction of virtual sound images. Binaural technology is based on the principle that if a sound reproduction system can generate the same sound pressures at the listener's eardrums as would have been produced there by a real sound source, then the listener should not be able to tell the difference between the virtual image and the real sound source.
A typical surround-sound system, for example, assumes a specific speaker setup to generate the sweet spot, where the auditory imaging is stable and robust. However, not all areas can accommodate the proper specifications for such a system, further minimizing a sweet spot that is already small. For the implementation of binaural technology over loudspeakers, it is necessary to cancel the cross-talk that prevents a signal meant for one ear from being heard at the other. However, such cross-talk cancellation, normally realized by time-invariant filters, works only for a specific listening location and the sound field can only be controlled in the sweet-spot.
A digital sound projector is an array of transducers or loudspeakers that is controlled such that audio input signals are emitted as a beam of sound that can be directed into an arbitrary direction within the half-space in front of the array. By making use of carefully chosen reflection paths, a listener will perceive a sound beam emitted by the array as if originating from the location of its last reflection. If the last reflection happens in a rear corner, the listener will perceive the sound as if emitted from a source behind him or her.
One application of digital sound projectors is to replace conventional surround-sound systems, which typically employ several separate loudspeakers placed at different locations around a listener's position. The digital sound projector, by generating beams for each channel of the surround-sound audio signal, and steering the beams into the appropriate directions, creates a true surround-sound at the listener's position without the need for further loudspeakers or additional wiring. One such system is described in U.S. Patent Publication No. 2009/0161880 of Hooley, et al., the disclosure of which is incorporated herein by reference.
Cross-talk cancellation is in a sense the ultimate sound reproduction problem since an efficient cross-talk canceller gives one complete control over the sound field at a number of “target” positions. The objective of a cross-talk canceller is to reproduce a desired signal at a single target position while cancelling out the sound perfectly at all remaining target positions. The basic principle of cross-talk cancellation using only two loudspeakers and two target positions has been known for more than 30 years. In 1966, Atal and Schroeder used physical reasoning to determine how a cross-talk canceller comprising only two loudspeakers placed symmetrically in front of a single listener could work. In order to reproduce a short pulse at the left ear only, the left loudspeaker first emits a positive pulse. This pulse must be cancelled at the right ear by a slightly weaker negative pulse emitted by the right loudspeaker. This negative pulse must then be cancelled at the left ear by another even weaker positive pulse emitted by the left loudspeaker, and so on. Atal and Schroeder's model assumes free-field conditions. The influence of the listener's torso, head and outer ears on the incoming sound waves is ignored.
In order to control delivery of the binaural signals, or “target” signals, it is necessary to know how the listener's torso, head, and pinnae (outer ears) modify incoming sound waves as a function of the position of the sound source. This information can be obtained by making measurements on “dummy-heads” or human subjects. The results of such measurements are referred to as “head-related transfer functions”, or HRTFs.
HRTFs vary significantly between listeners, particularly at high frequencies. The large statistical variation in HRTFs between listeners is one of the main problems with virtual source imaging over headphones. Headphones offer good control over the reproduced sound. There is no “cross-talk” (the sound does not run round the head to the opposite ear), and the acoustical environment does not modify the reproduced sound (room reflections do not interfere with the direct sound). Unfortunately, however, when headphones are used for the reproduction, the virtual image is often perceived as being too close to the head, and sometimes even inside the head. This phenomenon is particularly difficult to avoid when one attempts to place the virtual image directly in front of the listener. It appears to be necessary to compensate not only for the listener's own HRTFs, but also for the response of the headphones used for the reproduction. In addition, the whole sound stage moves with the listener's head (unless head-tracking is used, and this requires a lot of extra processing power). Loudspeaker reproduction, on the other hand, provides natural listening conditions but makes it necessary to compensate for cross-talk and also to consider the reflections from the acoustical environment.