Sounds are a constant presence in everyday life and offer rich cues about the environment. Sounds come from all directions and distances, and individual sounds can be distinguished by pitch, tone, loudness, and by their location in space. Three-dimensional (3D) sound recording and synthesis are topics of interest in scientific, commercial, and entertainment fields. With the popularity of 3D movies, and even emerging 3D televisions and 3D computers, spatial vision is no longer a phantasm. In addition to cinema and home theaters, 3D technology is found in applications, for example, from a simple videogame to sophisticated virtual reality simulators.
Three-dimensional (3D) sound is often termed as spatial sound. The spatial location of a sound is what gives the sound a three-dimensional aspect. Humans use auditory localization cues to locate the position of a sound source in space. There are eight sources of localization cues: interaural time difference, head shadow, pinna response, shoulder echo, head motion, early echo response, reverberation, and vision. The first four cues are considered static and the other four cues dynamic. Dynamic cues involve movement of a subject's body affecting how sound enters and reacts with the subject's ear. There is a need for accurately synthesizing such spatial sound to add to the immersiveness of a virtual environment.
In order to gain a clear understanding of spatial sound, there is a need for distinguishing monaural, stereo, and binaural sound from three-dimensional (3D) sound. A monaural sound recording is a recording of a sound with one microphone. There is no sense of sound positioning in monaural sound. Stereo sound is recorded with two microphones positioned several feet apart and separated by empty space. When a stereo recording is played back, the recording from one microphone goes into the subject's left ear, while the recording from the other microphone is channeled into the subject's right ear. This gives a sense of the position of the sound as recorded by the microphones. Listeners of stereo sound often perceive the sound sources to be at a position inside their heads. This is due to the fact that humans do not normally hear sounds in the manner they are recorded in stereo, separated by empty space. The human head acts as a filter to incoming sounds.
Generally, human hearing localizes sound sources in a three-dimensional (3D) spatial field, mainly by three cues: an interaural time difference (ITD) cue, an interaural level difference (ILD) cue, and a spectral cue. The ITD is the difference of arrival times of transmitted sound between the two ears. The ILD is the difference in level and/or intensity of the transmitted sound received between the two ears. The spectral cue describes the frequency content of the sound source, which is shaped by the ear. For example, when a sound source is located exactly and directly in front of a human, the ITD and the ILD of the sound is approximately zero, since the sound arrives at the same time and level. If the sound source shifts to the left, the left ear receives the sound earlier and louder than the right ear. This helps humans determine from where the sound is being emitted. When a sound is emitted by a sound source from the left of a listener, the ITD from the left to the right reaches its maximum value. The combination of these factors is modeled by two sets of filters on the left ear and the right ear separately in order to describe the spatial effect which is recognizable by human hearing. The transfer functions of such filters are called head related transfer functions (HRTFs). Since different effects are caused by different locations of the sound source, the HRTFs are a bank by positions.
Binaural recordings sound more realistic as they are recorded in a manner that more closely resembles the human acoustic system. To achieve three-dimensional (3D) spatial effects on audio, for example, music, earlier binaural recording also referred to as dummy head recording, was obtained by placing two microphones in inner ear locations of an artificial life, average sized human head. However, in such a case, many specific details such as reflection and influence from shoulders and the human torso on the acoustic performance were not considered. Currently, binaural sound is recorded by measuring head related transfer functions using a human head simulator with two microphones inside the ears. Binaural recordings sound closer to what humans hear in the real world as the human head simulator filters sound in a manner similar to the human head. In existing technology, the human head simulator is too large to be mounted on a portable device and is also expensive. Moreover, the recorded binaural sound can only be used for headsets and cannot be used for a surround sound system. Furthermore, the recorded binaural sound cannot be modified or configured during reproduction. Although the existing technologies are able to achieve a few enhancements on the 3D spatial audio experience for a user, they do not provide an option for the user to adjust the source locations and directions of the recorded audio.
Professional studio recordings are performed on multiple sound tracks. For example, in a music recording, each instrument and singer are recorded on individual sound tracks. The sound tracks are then mixed to form stereo sound or surround sound. Currently, surround sound is created using multiple different methods. One method is to use a surround sound recording microphone technique, and/or to mix in surround sound for playback on an audio system with speakers that encircle the listener to play audio from different directions. Another method is to process the audio with psychoacoustic sound localization methods to simulate a two-dimensional (2D) sound field with headphones. Another method, based on Huygens' principle, attempts to reconstruct recorded sound field wave fronts within a listening space, for example, in an audio hologram form. One form, for example, wave field synthesis (WFS), produces a sound field with an even error field over the entire area. Commercial WFS systems require many loudspeakers and significant computing power. Moreover, current surround sound cannot be recorded by a portable device and is not configurable by users.
Because of the complex nature of current state-of-the-art systems, several concessions are required for feasible implementations, especially if the number of sound sources that have to be rendered simultaneously is large. Recent trends in consumer audio show a shift from stereo to multi-channel audio content, as well as a shift from solid state devices to mobile devices. These developments cause additional constraints on transmission and rendering systems. Moreover, consumers often use headphones for audio rendering on a mobile device. To experience the benefit of multi-channel audio, there is a need for a compelling binaural rendering system.
Hence, there is a long felt but unresolved need for a method and a configurable three-dimensional (3D) sound system that perform 3D sound recording, processing, synthesis and reproduction to enhance existing audio performance to match a vivid 3D vision field, thereby enhancing a user's experience. Moreover, there is a need for a method and a configurable 3D sound system that accurately measure head related transfer functions using a simulator apparatus that considers specific details such as reflection and influence from shoulders and the human torso on the acoustic performance. Furthermore, there is a need for a method and a configurable 3D sound system that simultaneously generates a configurable three-dimensional binaural sound, a configurable three-dimensional stereo sound, and a configurable three-dimensional surround sound on a mobile computing device or other device using selections acquired from a user. Furthermore, there is a need for a method and a configurable 3D sound system that generates a configurable three-dimensional binaural sound from a stereo sound and a multi-channel sound.