1. Field of the Invention
The present invention relates to signal processing techniques. More particularly, the present invention relates to methods for processing audio signals.
2. Description of the Related Art
Binaural or multi-channel spatialization processing of audio signals typically requires heavy processing costs for increasing the quality of the virtualization experience, especially for accurate 3-D positional audio rendering, for the incorporation of reverberation and reflections, or for rendering spatially extended sources. It is desirable to provide improved binaural and multi-channel spatialization processing algorithms and architectures while minimizing or reducing the associated additional processing costs.
In binaural 3-D positional audio rendering schemes, a fractional delay implementation is necessary in order to allow for continuous variation of the ITD according to the position of a virtual source. The first-order linear interpolation technique causes significant spectral inaccuracies at high frequencies (a low-pass filtering for non-integer delay values). Avoiding this artifact requires a more expensive fractional delay implementation. It is therefore desirable to provide new techniques for simulating continuous ITD variation that do not require interpolation or fractional delay implementation.
Binaural 3D audio simulation is generally based on the synthesis of primary sources that are point source emitters, i.e. which appear to emanate from a single direction in 3D auditory space. In real-world conditions, many sound sources generally approximate the behavior of point sources. However, some sound-emitting objects radiate acoustic energy from a finite surface area or volume whose dimensions render the point-source approximation unacceptable for realistic 3D audio simulation. Such sound-emitting objects may be more suitably represented as line source emitters (such as a vibrating violin string), area source emitters (such as a resonating panel) or volume source emitters (for example a waterfall).
In general, the position, shape and dimensions of a spatially extended source are specified and altered under program control, while an appropriate processing algorithm is applied to a monophonic input signal in order to simulate the spatial extent of the emitter. Two existing approaches to this problem include pseudo-stereo approaches and multi-source dynamic decorrelation approaches.
The goal of pseudo-stereo techniques is to create a pair of decorrelated signals from a monophonic audio input so as to increase the apparent width of the image when played back over two loudspeakers, compared to direct playback of the monophonic input. These techniques can be adapted to simulate spatially extended sources by panning and/or mixing the decorrelated signals. When applied to the 3D audio simulation of spatially extended sources, pseudo-stereo algorithms have three main limitations: they can generate audible artifacts including timbre coloration and phase distortion; they are designed to generate a pair of decorrelated signals, and are not suitable for generating higher numbers of decorrelated versions of the input signal; and they incur substantial per-source computational costs, as each monophonic source is individually processed to generate decorrelated versions prior to mixing or panning.
The multi-source dynamic decorrelation approach addresses some of the above limitations. Multiple decorrelated versions of a monophonic input signal are generated using an approach called dynamic decorrelation, which uses a different sparse FIR filter with different delays and coefficients to produce each decorrelated version of the input signal. The delays and coefficients are chosen such that the sum of the decorrelated versions is equal to the original input signal. The resulting decorrelated signals are individually spatialized in 3-D space to cover an area or volume that corresponds to the dimensions of the object being simulated. This technique is less prone to coloration and phase artifacts than prior pseudo-stereo approaches and less restrictive on the number of decorrelated sources that can be generated. Its main limitation is that it incurs substantial per-source computation costs. Not only must multiple decorrelated signals be generated for each object, but each resulting signal must then be spatialized individually. The amount of processing necessary to generate a spatially extended sound object is variable, as the number of decorrelated sources generated depends on factors including the spatial extent and shape of the object, as well as the audible angle subtended by the object with respect to the listener, which varies with its orientation and distance. It is desirable to provide new techniques for computationally efficient simulation of spatially extended sound sources.