In many applications, it is desirable to produce audio signals that appear, to a listener perceiving the signals, to originate from a particular direction at a particular distance. This is even though the audio signals are provided from a fixed source (e.g., stereo loudspeakers). In these applications, an input audio signal may be provided to an audio signal processor, along with parameters of direction and distance, such as elevation angle and azimuth angle, relative to the front face of a listener. A system or method, ideally, receives/processes an audio signal and generates left and right audio signals responsive to a head-related transfer function (HRTF) so that the left and right audio signals, when broadcast to the listener, appear to originate from the desired direction and distance (parameters).
In order to create a system that may generate signals appearing to originate from particular directions, the head response of a human model has been determined for signals originating at various locations about the head of the human model. In one particular study, signals were broadcast from 710 different positions at various elevation and azimuth angles about the head of the human model, and received by microphones planted in each ear canal of the model. The results of the measurements were reported in: "HRTF Measurements of a KEMAR Dummy-Head Microphone," Gardner and Martin, MIT Media Lab Perceptual Computing--Technical Report #280, May 1994.
In the Gardner and Martin study, the impulse response for the left and right ear was determined for signals broadcast from each of the 710 locations. More specifically, a known input signal was broadcast from each broadcast position and the signals received by the microphones in the left and right ears of the human model were recorded. The impulse response was determined from the convolution of the known input signal and of the recorded signals received by the left ear and right ear microphones. The study produced 710 impulse responses having a minimal length of 128 samples, each sample being 16 bits. Using the impulse responses generated by this study, left and right audio signals can be generated that when broadcast will appear to originate from one of the 710 locations. Convolving an input signal with the impulse response of the desired origin or location generates three-dimensional left and right audio signals. This technique has proven to provide satisfactory "three-dimensional" signals.
However, the technique just described has a significant shortcoming in that it is computationally complex. That is, in order to determine a single sample to be broadcast for a left or right channel, 128 multiplications and summations must be performed. Thus, for each sample a total of 256 multiplications and summations must be performed --128 for the left channel and 128 for the right channel. If there are multiple sound sources, as in some applications, the number of multiplications and summations is equal to 256 times the number of sound sources for each sample. In addition, memory must be provided so that the 710 different 128, 16-bit impulse responses can be stored and retrieved for each sound source. Thus, it can be seen that to produce three-dimensional signals using convolution of impulse responses, a high-speed processor and a considerable amount of RAM and lookup tables may be required. For all but the most powerful systems, this will severely limit a system's ability to perform other functions, sound related or otherwise.
In order to reduce the computational complexity of this technique, modifications of this technique have been developed. For example, U.S. Pat. Nos. 5,173,944 and 5,438,623 disclose using a smaller set of impulse responses, and at only selected locations. When an impulse response is needed at a location not in the set, the impulse response is interpolated from the impulse response in the set about the desired location. While this technique reduces the size of the lookup table and required RAM, but it does not reduce the number of computations required to generate each sample of the three-dimensional audio signals. U.S. Pat. No. 5,596,644 breaks the impulse response of HRTF into components using a singular value decomposition process. This technique may reduce the computational complexity, but still requires a large number of computations to generate three-dimensional audio signals.
Thus, there is a need for an apparatus or method of generating three-dimensional audio signals using a reduced set of computations.