1. Statement of the Technical Field
The inventive arrangements relate to the field of audio processing and presentation and, in particular, to combining and customizing multiple audio environments to give the user a preferred illusion of sound (or sounds) located in a three dimensional space surrounding the listener.
2. Description of the Related Art
Binaural audio is sound that is processed to provide the listener with a three dimensional virtual audio environment. This type of audio allows the listener to be virtually immersed into any environment to simulate a more realistic experience. Having binaural sound emanating from different spatial locations outside the listener's head is different from stereophonic sound and it is different from monophonic audio.
Binaural sound can be provided to a listener either by speakers fixed in a room or by a speaker fixed to each ear of the listener. Providing a specific binaural sound to each ear using a set of room speakers is difficult because of acoustic crosstalk and because the listener must remain fixed relative to the speakers. Additionally, the binaural sound will not be dependent on the position or rotation of the listener's head. The use of headphones takes advantage of minimizing acoustic crosstalk and the fixed distance between the listener's ear and corresponding speaker in the headphone.
Under ordinary circumstances, the sound arriving at each eardrum of a person undergoes multiple changes that provide the listener's brain with information regarding the location of the sound source. Some of the changes are caused by the human torso, the head, the ear pinna, and the ear canal. Collectively, these changes are called the Head Related Transfer Function (HRTF). The HRTF is typically a function of both frequency and relative orientation between the head and the source of the sound. The effect of distance usually results in amplitude attenuation proportional to the distance between the sound source and the listener. The differences in the amplitude and the time-of-arrival of sound waves at the left and right ears, referred to as the interaural intensity difference (IID) and the interaural time difference (ITD), respectively, provide important cues for audibly locating the sound source. Spectral shaping and attenuation of the sound wave also provide important cues used by the listener to identify whether a source is in front of or in back of a listener.
Another filter sometimes used in binaural systems is a Binaural Room Impulse Response (BRIR). The BRIR includes information about all acoustical properties of a room, including the position and orientation of the sound source, the listener, the room dimensions, the wall's reflective properties, etc. Thus, depending on the size, shape, and wall material of a room, the sound source located at one end of the room has different sound properties when heard by a listener at the other end of the room. An example of this technology is provided in most sound systems that are purchased today. These systems have several different sound effects to give the listener the feeling of sitting in an auditorium, a stadium, an inside theater, an outside theater, etc. Research has been conducted to demonstrate the capability derived from BRIR to give the listener the perceived effect of sound bouncing off walls of differently shaped rooms.
Conventional binaural systems have been proposed which simulate some of these changes that occur to sound as it arrives at the human ear from a remote source. Some of these systems are directed toward improving the filtering performance of the HRTF. The term “filter” as used herein refers to devices which perform an operation equivalent to convolving a time-domain signal with an impulse response. Similarly, the term “filtering” and the like as used here refer to processes which apply such a filter to a time-domain signal. Considerable computational resources are required to implement accurate HRTFs because they are very complex functions of direction and frequency. The overall design of the binaural audio system is very important to reduce implementation costs, improve sound feed-back rates, and to implement practical binaural sound fields which may include many sound sources.
At the highest level, a binaural system typically consists of three parts. The first part is the receiver. The receiver is generally designed to receive a monophonic radio frequency (RF) signal containing audio information, along with the metadata for that audio information. For example, the metadata typically includes spatial location information of the source of the particular audio information. This spatial location information can then be used to produce a binaural audio signal that simulates the desired spatial location of the source. A processor receives this metadata from the receiver as well as data from the listener's head-tracking apparatus. The processor uses this information to generate the audio that will be heard by each ear. Finally, the left and right audio is sent to a sound producer that can either be implemented with floor speakers positioned around a listener or with a headphone that places speakers next to each ear of a listener. The floor speakers have the disadvantage of having the listener fixed in position to hear three-dimensional (3-D) binaural sound. However, a headphone allows the listener to move freely while the processor monitors his movement and head position.
Most efforts toward improving binaural systems have focused on improving the fidelity of the binaural sound, increasing the speed of the binaural sound processor, or increasing the number of possible listeners. However, these efforts have tended to focus on the process for simulating a virtual audio environment. In contrast, few efforts have been directed to innovative applications for actually putting such binaural audio information to practical use.