This invention relates generally to three-dimensional or "virtual" audio. More particularly, this invention relates to a method and apparatus for reducing the complexity of imaging filters employed in virtual audio displays. In accordance with the teachings of the invention, such reduction in complexity may be achieved without substantially affecting the psychoacoustic localization characteristics of the resulting three-dimensional audio presentation.
Sounds arriving at a listener's ears exhibit propagation effects which depend on the relative positions of the sound source and listener. Listening environment effects may also be present. These effects, including differences in signal intensity and time of arrival, impart to the listener a sense of the sound source location. If included, environmental effects, such as early and late sound reflections, may also impart to the listener a sense of an acoustical environment. By processing a sound so as to simulate the appropriate propagation effects, a listener will perceive the sound to originate from a specified point in three-dimensional space that is a "virtual" position. See, for example, "Headphone simulation of free-field listening" by Wightman and Kistler, J. Acoust. Soc. Am., Vol. 85, No. 2, 1989.
Current three-dimensional or virtual audio displays are implemented by time-domain filtering an audio input signal with selected head-related transfer functions (HRTFs). Each HRTF is designed to reproduce the propagation effects and acoustic cues responsible for psychoacoustic localization at a particular position or region in three-dimensional space or a direction in three-dimensional space. See, for example, "Localization in Virtual Acoustic Displays" by Elizabeth M. Wenzel, Presence, Vol. 1, No. 1, Summer 1992. For simplicity, the present document will refer only to a single HRTF operating on a single audio channel. In practice, pairs of HRTFs are employed in order to provide the proper signals to the ears of the listener.
At the present time, most HRTFs are indexed by spatial direction only, the range component being taken into account independently. Some HRTFs define spatial position by including both range and direction and are indexed by position. Although particular examples herein may refer to HRTFs defining direction, the present invention applies to HRTFs representing either direction or position.
HRTFs are typically derived by experimental measurements or by modifying experimentally derived HRTFs. In practical virtual audio display arrangements, a table of HRTF parameter sets are stored, each HRTF parameter set being associated with a particular point or region in three-dimensional space. In order to reduce the table storage requirements, HRTF parameters for only a few spatial positions are stored. HRTF parameters for other spatial positions are generated by interpolating among appropriate sets of HRTF positions which are stored in the table.
As noted above, the acoustic environment may also be taken into account. In practice, this may be accomplished by modifying the HRTF or by subjecting the audio signal to additional filtering simulating the desired acoustic environment. For simplicity in presentation, the embodiments disclosed refer to the HRTFs, however, the invention applies more generally to all transfer functions for use in virtual audio displays, including HRTFs, transfer functions representing acoustic environmental effects and transfer functions representing both head-related transforms and acoustic environmental effects.
A typical prior art arrangement is shown in FIG. 1. A three-dimensional spatial location or position signal 10 is applied to an HRTF parameter table and interpolation function 11, resulting in a set of interpolated HRTF parameters 12 responsive to the three-dimensional position identified by signal 10. An input audio signal 14 is applied to an imaging filter 15 whose transfer function is determined by the applied interpolated HRTF parameters. The filter 15 provides a "spatialized" audio output suitable for application to one channel of a headphone 17.
Although the various Figures show headphones for reproduction, appropriate HRTFs may create psychoacoustically localized audio with other types of audio transducers, including loudspeakers. The invention is not limited to use with any particular type of audio transducer.
When the imaging filter is implemented as a finite-impulse-response (FIR) filter, the HRTF parameters define the FIR filter taps which comprise the impulse response associated with the HRTF. As discussed below, the invention is not limited to use with FIR filters.
The main drawback to the prior art approach shown in FIG. 1 is the computational cost of relatively long or complex HRTFs. The prior art employs several techniques to reduce the length or complexity of HRTFs. An HRTF, as shown in FIG. 2a, comprises a time delay D component and an impulse response g(t) component. Thus, imaging filters may be implemented as a time delay function Z.sup.-D and an impulse response function g(t), as shown in FIG. 2b. By first removing the time delay, thereby time aligning the HRTFs, the computational complexity of the impulse response function of the imaging filter is reduced.
FIG. 3a shows a prior art arrangement in which pairs of unprocessed or "raw" HRTF parameters 100 are applied to a time-alignment processor 101, providing at its outputs time-aligned HRTFs 102 and time-delay values 103 for later use (not shown). Processor 101 cross-correlates pairs of raw HRTFs to determine their time difference of arrival; these time differences are the delay values 103. Because the time delay value values 103 and the filter terms are retained for later use, there is no psychoacoustic localization loss--the perceptual impact is preserved. Each time-aligned HRTF 102 is then processed by a minimum-phase converter 104 to remove residual time delay and to further shorten the time-aligned HRTFs.
FIG. 3b shows two left-right pairs (R1/L1 and R2/L2) of exemplary raw HRTFs resulting from raw HRTF parameters 100. FIG. 3c shows corresponding time-aligned HRTFs 102. FIG. 3d shows the corresponding output minimum-phase HRTFs 105. The impulse response lengths of the time-aligned HRTFs 102 are shortened with respect to the raw HRTFs 100 and the minimum-phase HRTFs 105 are shortened with respect to the time-aligned HRTFs 102. Thus, by extracting the delay so as to time align the HRTFs and by applying minimum phase conversion, the filter complexity (its length, in the case of an FIR filter) is reduced.
Despite the use of the techniques of FIGS. 2b and 3a, at an audio sampling rate of 48 kHz, minimum phase responses as long as 256 points for an FIR filter are commonly used, requiring processors executing on the order of 25 mips per audio source rendered.
When computational resources are limited, two additional approaches are used in the prior art, either singly or in combination, to further reduce the length or complexity of HRTFs. One technique is to reduce the sampling rate by down sampling the HRTF as shown in FIG. 4a. Since many localization cues, particularly those important to elevation, involve high-frequency components, reducing the sampling rate may unacceptably degrade the performance of the audio display.
Another technique, shown in FIG. 4b, is to apply a windowing function to the HRTF by multiplying the HRTF by a windowing function in the time domain or by convolving the HRTF with a corresponding weighting function in the frequency domain. This process is most easily understood by considering the multiplication of the HRTF by a window in the time domain--the window width is selected to be narrower than the HRTF, resulting in a shortened HRTF. Such windowing results in a frequency-domain smoothing with a fixed weighting function. This known windowing technique degrades psychoacoustic localization characteristics, particularly with respect to spatial positions or directions having complex or long impulse responses. Thus, there is a need for a way to reduce the complexity or length of HRTFs while maintaining the perceptual impact and psychoacoustic localization characteristics of the original HRTFs.