For many years, people have made binaural recordings because of the realism that is possible. Using microphones placed in simulated or real human ears, such recordings capture many of the nuances of what gives people the ability to detect the direction of sound. So when listening to such music through headphones, the same cues are received, which lends to a realistic experience.
Binaural sound seems well-suited for virtual reality (VR) or augmented reality (AR) because it is similar to the way the visual portion of such systems work—a video scene is placed in front of the eyes to replace or enhance the real world visual scene with the virtual world scene. Similarly, placing headphones on the ears allow the virtual sound that corresponds to the virtual visual scene.
Video games and other techniques exist for generating synthetic virtual environments. Given the objects in the virtual world, as the wearer of the VR viewer moves her head, head-tracking technology sends information to the computer and then graphics routines can render the virtual visual environment for display in front of the eyes. Similarly, techniques for generating binaural or stereo sound can cause the sound to be generated from the apparent direction between the user's head orientation and each of the sound sources. As the user rotates her head, the relative direction of the various visual and sound sources will change, possibly in different ways. For example, objects to the left will tend to move around the back, and thus right-ward as the user rotates her head to the right, whereas objects in front of the viewer in virtual reality will move toward the left.
The problem is somewhat more involved for creating virtual reality audio of real-world scenes, because there is no a priori knowledge of where all the sound sources and objects are.
People involved in the art have developed methods for obtaining the visual scene from wide-angle stereo-optic cameras that capture a wide angle, for example 180 degrees or 360 degrees around the eyes, of a visual field. Then head-tracking technology wearable by the viewer can select the portion of the imagery from the entire field that corresponds to what is viewable in that direction, moving that imagery to the center of the field of view.
Audio recording technology such as above can be used to record the binaural, virtual-reality sound environment. However, current inventions intended for this purpose do poorly when the user turns his or her head, because there is not a good way to rotate the virtual sound sources in response to head motions in a similar fashion, since the sounds from the various sound sources are all mixed together in the sound stream.
Previous inventions have created ways to create sonic environments that appear to correctly maintain direction of origin of sounds, but they typically, require several microphones and/or several channels of audio so that the sounds can be appropriately recombined, or in the cases where only two channels of transmission are required, the channels are not the same as standard sterophonic or binaural recordings. For example, U.S. Pat. No. 3,997,725 to Gerzon discloses a multidirection sound reproduction system that uses separate omnidirectional and azimuthal signals to create a surround sound effect with arrays of speakers. U.S. Pat. No. 4,086,433 to Gerzon provides various enhancements for irregular arrays of speakers. U.S. Pat. No. 5,594,800 to Gerzon describes a matrix converter approach. U.S. Pat. No. 5,757,927 to Gerzon similarly describes a surround-sound approach using what is called therein “B-Format” signals or W, X, Y. To achieve a similar function, but with fixed speakers surrounding the user. While providing realistic 3D surround sound, these approaches do not directly address the case of a person wearing headphones, in which case the audio would need to change according to head direction. In “3D Binaural Sound Reproduction using a Virtual Ambisonic Approach” by Noisternig, et. al, VECIMS 2003 Conference in Lugano, Switzerland, an approach is presented that rotates the sound in accordance with rotation of the user's head. However, this approach also uses multiple channels of encoded audio, which are combined according to the output of a head-tracking unit. U.S. Pat. No. 6,144,747 to Scofield, et. al. discloses an encoding scheme that takes a 4-channel (quadraphonic) signal and combines the four channels into a binaural-like, two channel signal, so that the sound experienced by the user with nearby left and right speakers seems to arrive like the 4-channel signal would arrive from four loudspeakers. This is a similar surround-sound idea, but does not appear to address the issue of wearing headphones and rotating the head, as well as assumes surround-sound encoding of the audio. In contrast to such approaches, it is preferable for many applications to be able to use existing two-channel recording technology such as is used for binaural and stereophonic audio, rather than prior art multi-channel encoding technology. Using standard two-channel inputs makes it possible to create surround-sound rotation effects from recordings that are recorded and distributed using standard, commonly-available two-channel techniques. It is also preferable for many approaches for the user to wear standard headphones for hearing the sound.
Yet another approach that could be used for surround sound is beam-forming. A series of audio beam-formers, such as are used for surveillance devices or hearing aids, could be used to obtain a signal from each of several directions. Each signal could then be rotated to appear to come from a corrected direction. However, this approach would have the advantage that the left and right portions of the signal for each beam are irreversibly combined, so that any nuances about the left and right signals coming to the ear from that source are not present in the output signal.