Wave field synthesis is an audio reproduction method for spatial rendering of complex audio scenes that was developed at the Delft University of Technology. Unlike most existing methods of audio reproduction, spatially correct rendering is not restricted to a small area, but extends across an extensive rendering area. WFS is based on a sound mathematical-physical foundation, namely the principle of Huygens and the Kirchhoff-Helmholtz integral.
Typically, a WFS reproduction system consists of a large number of loudspeakers (so-called secondary sources). The loudspeaker signals are formed from delayed and scaled input signals. Since many audio objects (primary sources) are typically used in a WFS scene, a very large number of such operations may be performed for producing the loudspeaker signals. This accounts for the high level of computing power that may be useful for wave field synthesis.
In addition to the above-mentioned advantages, WFS also offers the possibility of realistically imaging moving sources. This feature is exploited in many WFS systems and is of great importance, for example, for utilization in cinemas, virtual-reality applications or live performances.
However, rendering moving sources causes a series of characteristic errors that do not occur in the case of static sources. Signal processing of a WFS rendering system has a significant impact on the rendering quality.
A primary goal is to develop signal processing algorithms for rendering moving sources by means of WFS. In this context, real-time capability of the algorithms is an important precondition. The most important criterion for evaluating the algorithms is the objective perceived audio quality.
As has been said, WFS is a method of audio reproduction that is very costly in terms of processing resources. This is due, above all, to the large number of loudspeakers employed in a WFS setup, and to the fact that the number of virtual sources used in WFS scenes is often high. For this reason, the efficiency of the algorithms to be developed is of outstanding importance.
An important issue is about which quality improvement is to be achieved by the algorithms to be developed. This is specifically true while taking into account the other artefacts caused by the WFS which possibly make themselves felt in an even more interfering manner or mask the artefacts of signal processing, depending on the quality of the signal processing algorithms. Therefore, the focus is on developing algorithms whose qualities are scalable via various parameters (e.g. interpolation orders, filter lengths, etc.). As an extreme case, this includes algorithms whose rendering errors are below the threshold of perception under optimized conditions (omission of any other artefacts). Depending on the quality desired, the markedness of the other artefacts and the resources available, an optimum tradeoff may be found.
A series of criteria and ranges of values may be defined which facilitate designing algorithms. They include:
(a) Reliable source speeds. Generally, virtual sources having random source speeds are to be supported. However, the influence of the Doppler shift increases as the speed increases. In addition, many physical laws that are also used in WFS only apply to speeds below the speed of sound. Therefore, the following admissible range is specified as a range which is considered to be useful for the source speed vsrc:
                v      src            ≤            1      2        ⁢          c      .      
In this context, c is the speed of sound of the medium. Under standard conditions, the allowed speed of sources therefore amounts to about 172 m/s, or 619 km/h.
(b) Frequency ranges. The entire audio frequency range, i.e.20 Hz≦f≦20 kHz  (1),shall be assumed as the rendering range for the frequency f.
It is to be noted that the selection of the upper cutoff frequency and of the quality to be achieved thereby has a decisive impact on the algorithms' resource requirements.
(c) Sampling frequency. The selection of the sampling rate has a large impact on the algorithms to be designed. On the one hand, the error of most delay interpolation algorithms increases sharply as the distance of the frequency range of interest from the Nyquist frequency decreases. Also, the lengths of many filters that may be used by algorithms increases sharply as the range between the upper cutoff frequency of the audio frequency range and the Nyquist frequency becomes narrower, since this range is used as a so-called don't-care band in many filter design processes.
Changes in the sampling frequency may therefore entail extensive adaptations of the filters used and other parameters, and may therefore also decisively influence the performance and the suitability of specific algorithms.
As a standard feature, systems common in professional audio technology are operated at a sampling rate of 48 kHz. Therefore, this sampling frequency shall be assumed in the following.
(d) Target hardware. Even though the algorithms to be developed are generally independent of the hardware used, specifying the target platform is useful for various reasons:
(i) The architecture of the CPUs employed, e.g. supporting parallel work, has an impact on the design of the algorithms.
(ii) The size and architecture of the memory used influence design decisions with regard to designing algorithms.
(iii) For specifying performance requirements, indications of the efficiency of the target hardware are useful.
Since systems currently and in the foreseeable future are (will be) mostly based on PC technology, the following properties shall be assumed:
Current desktop or work station standard components on the basis of x86 technology,
No utilization of special hardware,
Processors with performant |floating-point functionality,
Comparatively large working memory, and
Typically support of SIMD instruction sets (e.g. SSE).
Algorithmics in audio signal processing in wave field synthesis may be divided up into various categories:
(1) Calculating the WFS parameters. By applying the WFS synthesis operator, a scaling value and a delay value are determined for each combination of source and loudspeaker. This calculation is performed at a relatively low frequency. Between these nodes, the scale and delay values are interpolated by means of simple methods. Therefore, the influence on the performance is comparatively small.
(2) Filtering. For implementing the WFS operator, filtering using a low-pass filter with an edge steepness of 3 dB may be useful. Additionally, an adaptation to the rendering conditions may be performed, said adaptation being dependent on the source or loudspeaker. However, since the filter operation is performed only once per input and/or output signal, respectively, the performance requirement is generally moderate. In addition, in current WFS systems, this operation is performed on dedicated arithmetic units.
(3) WFS scaling. This operation, which is often incorrectly referred to as WFS convolution, applies the delay calculated by the synthesis operator to the input signals stored in a delay line, and scales this signal with a scaling also calculated by the synthesis operator. This operation is performed for each combination of virtual source and loudspeaker. The loudspeaker signals are formed by summing all of the scaled input signals for the loudspeaker in question.
Since WFS scaling is performed for each combination of virtual source and loudspeaker as well as for each audio sample, it forms the main proportion of the resource requirements of a WFS system even if the individual operation has very low complexity.
In addition to the known rendering errors (artefacts) of WFS, a series of further characteristic errors occur with moving sources. The following errors may be identified:
(A) Comb filter effects (spatial aliasing). The spatial aliasing known from rendering static sources produces, above the aliasing frequency, an interference pattern that is dependent on the source position and on the frequency and is coined by superelevations and sharp depressions. In the event of movements of the virtual source, this pattern changes dynamically and thus produces time-dependent frequency distortion for an observer who is not moving.
(B) Non-observance of the delayed time. For calculating the WFS parameters, the current position of the source is used. However, for accurate rendering, the decisive position is that from which the currently impinging sound was sent out. This creates a systematic error of the Doppler shift which, however, is relatively small for moderate speeds and is very likely not to be perceived as disturbing in most WFS applications.
(C) Doppler spread. Due to the different relative speeds, a moving source leads to various Doppler frequencies in the signals emitted by the secondary sources. Said Doppler frequencies express themselves, at the hearing location, in a broadening of the frequency spectrum of the virtual source. This error cannot be explained by the WFS theory and is an object of current research.
(D) Audio disturbances due to delay interpolation. For WFS scaling, input signals that are delayed by a random amount may be useful which are calculated from the discrete samples that are present only at random points in time. The algorithms used for this purpose differ strongly in terms of quality and often produce artefacts that are perceived as disturbing.
The natural Doppler effect, i.e. the frequency shift of a moving source, is not classified as an artefact here, since it is a property of the primary sound field to be rendered by a WFS system. Nevertheless, it is undesired in many applications.
The operation of determining the value of a time-discretely sampled signal at random points in time is referred to as delay interpolation or fractional-delay interpolation.
To this end, a large number of algorithms have been developed which strongly differ in terms of complexity and quality of the interpolation. Generally, fractional-delay algorithms are implemented as discrete filters which have a time-discrete signal as their input, and an approximation of the delayed signal as their output.
Fractional-delay interpolation algorithms may be classified by various criteria:
(I) Filter structure. FD (fractional delay) filters may be implemented both as FIR (finite impulse response) and as IIR (infinite impulse response) filters.
FIR filters generally may use a larger number of filter coefficients and, thus, of arithmetic operations, and also, they produce amplitude errors for random fractional delays. However, they are stable, and there are many design processes, which include many closed, non-iterative design processes.
IIR filters may be implemented as all-pass filters, which exhibit an amplitude response which is precisely constant and, thus, ideal for FD filters. However, it is not possible to influence the phase of an IIR filter as precisely as in the case of an FIR filter. Most design methods for IIR-FD filters are iterative, and accordingly, they are not suited for real-time applications with variable delays. The only exceptions are Thiran filters, for which explicit formulae for the coefficients exist. For implementing IIR filters, it is useful to store the value of the preceding outputs. This is unfavorable for implementation in a WFS reproduction system, since a multitude of previous output signals would have to be administered. In addition, utilization of internal states reduces the suitability of IIR filters for variable delays, since the internal state was possibly calculated for a different fractional delay than the current one. This leads to interferences in the output signal which are referred to as transients.
For these reasons, only FIR filters will be studied for utilization in WFS reproduction systems.
(II) Fixed and variable fractional delays. Once their coefficients have been designed, FD filters are valid only for a specific delay value. The design operation may be performed again for each new value. Depending on the cost of this design operation, methods are suited to varying degrees for real-time operation with variable delays.
Methods for variable fractional delays (VFD) combine the coefficient calculation and the filter calculation and are therefore very well suited for real-time changes in the delay value. They are a variant of variable digital filters.
(III) Asynchronous sampling rate conversion. In WFS, continuously variable delays are useful. In the reproduction of a virtual source which moves linearly to a secondary source, the delay is a linear function of time, for example. This operation may be classified as an asynchronous sampling rate conversion. Methods for asynchronous sampling rate conversion are typically implemented on the basis of variable fractional-delay algorithms. In addition, however, they exhibit several problems that are to be solved additionally, e.g. the usefulness of suppressing imaging and aliasing artefacts.
(IV) Range of values of the fractional-delay parameter. The range of the variable delay parameter dfrac is dependent on the method used and is not necessarily the range 0≦dfrac≦1. For most FIR methods, it is within the range of
                    N        -        1            2        ≤          d      frac        ≤                  N        +        1            2        ,N being the order of the method. In this manner, the deviation from a linear-phase behavior is minimized. An exactly linear-phase behavior is possible only for specific values of dfrac.
By decomposing the desired delay value d into an integer value dint and a fractional portion dfrac, random delays may be produced by using a fractional-delay filter. The delay by dint is implemented, in this context, by an index shift in the input signal.
However, adhering to the ideal working range results in a minimum value of the delay, which may not be fallen below in order to keep to the causality. Therefore, methods for delay interpolation, specifically high-quality FD algorithms with long filter lengths, also entail an increase in the system latency. However, said system latency does not exceed an order of magnitude of 20 . . . 50 samples even for extremely costly processes. However, this is generally low as compared to other latencies of a typical WFS rendering system that are determined by the system.
The usefulness of delay interpolations results from the following considerations:
In the synthesis of moving sound sources by means of WFS, the delay applied to the audio signals are time-variant. Signal processing (rendering) of a WFS rendering system is performed in a time-discrete manner; therefore, source signals only exist at specified sampling times. The delay of a time-discrete signal by a multiple of the sampling period is possible in an efficient manner and is implemented by shifting the signal index. Accessing a value of a time-discrete signal that is located between two sampling points is referred to as delay interpolation or fractional delay. To this end, specific algorithms may be used which strongly differ in terms of quality and performance. An overview of fractional-delay algorithms shall be provided.
In WFS of moving sources, the delay times that may be used change dynamically and may adopt random values. Generally, a different delay value may be used for each loudspeaker signal. The algorithms used therefore may support random, variable delays.
While rounding off the delay to the nearest multiple of the sampling period provides sufficiently good results with static WFS sources, this method results in marked interferences with moving sources.
For wave field synthesis, a delay interpolation becomes useful for each combination of virtual source and loudspeaker. In connection with the complexity—useful for high rendering quality—of the delay interpolation, high-quality real-time implementation is not practicable.
The usefulness of delay interpolation for moving sources is described in Edwin Verheijen: “Sound repodiction by way field synthesis”, PhD thesis (pages 106-110), Delft University of Technology, 1997”. However, only simple (standard) delay interpolation methods are utilized for realizing the algorithms.
In Marije Baalman, Simon Schmpijer, Torben Hohn, Thilo Koch, Daniel Plewe and Eddie Mond: “Creating a large scale wave field synthesis system with swonder”, in Procc. of the 5th International Linux Audio Conference, Berlin, Germany, March 1997, the usefulness of a sampling rate conversion with moving virtual sources is pointed out. An algorithm is outlined on the basis of the Bresenham algorithm. However, this is an algorithm, based on integer calculation, of graphic data processing for plotting lines on rastered rendering devices. Therefore, it is to be assumed that it is not a real, interpolating sampling rate conversion, but a round-off of the nodes to the nearest integer sample index.
Various simple methods for delay interpolation are implemented in WFS renderers. By means of the class hierarchy used, the methods may simply be replaced. In addition to delay interpolation, temporal interpolation of the WFS parameters of delay (and also of scale) has an influence on the quality of the sampling rate conversion. In the conventional renderer structure, these parameters are updated only within a fixed raster (currently at a frequency of 32 audio samples).
The following algorithms are implemented:                IntegerDelay. This the original algorithm. It does not support any delay interpolation, i.e. delay values are rounded off to the nearest multiple of the sampling period. The delay and scaling parameters are updated within a raster of currently 32 samples. This algorithm is implemented in an optimized assembler variant and is suitable for real-time rendering of entire WFS scenes. Nevertheless, this operation takes up the major portion of the computational load that may be used within the renderer.        BufferwiseDelayLinear. The WFS parameters are adapted within a coarse raster (notation: bufferwise), the delayed signals themselves are calculated with a delay interpolation on the basis of a linear interpolation. Implementation is performed with the support of an assembler and is suitable, in terms of performance, for being employed with entire WFS scenes. This algorithm is currently used as a default setting.        SamplewiseDelayLinear. In this method, scaling and delay values are interpolated for each sample (notation: samplewise). Delay interpolation is again performed by linear interpolation (i.e. 1st-order Lagrange interpolation). This method is clearly more costly than the previous ones, and additionally, it exists only in a C++ reference implementation. Therefore, it is not suitable for being used with real, complex WFS scenes.        SamplewiseDelayCubic. Here, too, scale and delay are interpolated in a manner that is exact to the sample. The delay interpolation is performed using a third-order (i.e. cubic) Lagrange interpolator. This method, too, only exists as a reference implementation and is suitable exclusively for small numbers of sources.        