In the field of spatial audio reproduction, several methods have been known in conventional technology, including, for example, wave field synthesis, the fundamental idea of which is based on Huygens' principle according to which any point at which a wave arrives is a starting point of an elementary wave propagating in a spherical or circular manner. Wave field synthesis is employed in acoustics on the basis of a large number of loudspeakers arranged adjacent to one another, a so-called loudspeaker array, and is able, in principle, to replicate any shape of an incoming wave front. In the simplest case, the case of a single point source to be reproduced and a linear arrangement of the loudspeakers, the audio signals of any loudspeaker may be filtered, using a time delay and amplitude scaling, such that a corresponding spatial impression results for a listener, the radiated sound fields of the individual loudspeakers superimposing accordingly. If there are several sound sources, the contribution to each loudspeaker is calculated separately for each source, and the resulting signals are added. If the sources to be reproduced are located within a room having reflecting walls, reflections may possibly be compensated for via respective filters using the loudspeaker array.
The effort involved in calculating wave field synthesis highly depends on the number of sound sources to be reproduced, on the reflection properties of a reproduction space, and on the number of loudspeakers. The larger the loudspeaker arrays, i.e. the more individual loudspeakers are provided, the better the possibilities of wave field synthesis may be exploited. However, what is disadvantageous is that the computing power that may be used increases as the number of individual loudspeakers used increases. For each virtual sound source, i.e. sound source to be reproduced, a corresponding signal may be calculated and transmitted for each individual loudspeaker of the loudspeaker array. In particular with moving virtual sources, the computing effort increases tremendously, so that conventional systems very quickly reach their limits because of the representation of moving sound waves, the limiting factor being the computing power.
A further known technique of spatial sound field reproduction is Ambisonic. This technique is based on a harmonic decomposition of the acoustic field along a spherical surface (3D) or along the circumference of a circle (2D). In the reproduction, a finite number of these harmonic portions is used for reproducing the original sound field at a point, the listening point. Depending on the number of harmonic portions used (referred to as order), the spatial extension of the area of optimum reconstruction of the sound field increases. In the simplest useful case (1st order), tone information is coded into four channels, which is also known by the synonym of Ambisonic B format. In this context, a channel contains a mono signal of the tone information. The three other channels contain the spatial components of the three spatial dimensions. These three signals are based on a harmonic decomposition of the acoustic field along a spherical surface, and reflect the instantaneous pressure distribution of the audio waves. This case is also the commercially most useful case because the four signals originally had to fit on a phonograph record as a competition of quadrophony. Currently, work is being done on preparing a specification which uses the medium of DVDs and accordingly allows more channels.
Ambisonic enables decomposing a spatial audio signal into the four channels described, and to recompose it accordingly. In this context, the signals relate to a reference point arranged in the middle of a sphere which has the corresponding loudspeakers located on its surface. The representation of spatial audio signals in accordance with the Ambisonic method therefore offers a less complex possibility of storing and reproducing spatial signals. However, what is disadvantageous about this technology is that spatial resolution and, therefore, the impression of stereophonic sound that may be achieved are limited.
As the Ambisonic order increases, results of similar quality as with WFS may indeed be achieved. However, the complexity also highly increases as a result, and there exists no microphone which exhibits the directional pattern of these higher harmonics. In this case, sophisticated microphone arrays will have to be used.
WFS reconstructs within a volume (or within an area), and it does so with a quality which is dependent on the expenditure implemented (e.g. LS distance).
Ambisonic indeed reconstructs in a precise manner, but it does so starting from one point, and on a comparatively large area as WFS, it does this only for very high orders.
However, both methods have a common theoretical basis, which is holophony.
The signals refer to a reference point at which a listener is ideally located, which accordingly complicates coverage of a relatively large area, such as a cinema or a concert hall.
In addition, it is a precondition that both the reproduction loudspeakers in relation to the listening point, and the virtual sound objects in relation to the reproduction loudspeakers be located sufficiently far apart, so that planar wave fronts may be assumed in any case.
In addition, further methods of representing spatial tone sources have been known from technology. For example, DTS (digital theatre system) is a digital multi-channel surround sound format.
Methods such as DTS, Dolby Surround, may also be regarded as encoding formats. In this manner, audio signals which are suited for 5.1 reproduction may be stored on a DVD, for example.
It is employed both in cinemas and on data media, for example DVDs. Reproduction ideally is effected via circularly arranged loudspeakers, in the center of which there is a reproduction space which is favorable for spatial sound reproduction and is also referred to as “sweet area”. Dolby Digital signals, which are available in various variants, represent a further group of spatial sound signals. Apart from wave field synthesis, many audio formats have the disadvantage that only very limited spatial resolution and, thus, a limited spatial sound effect may be achieved. Wave field synthesis itself indeed offers spatial resolution, but said spatial resolution cannot be achieved, due to limited computing power, specifically in the case of several moving virtual tone sources, when, for example for consumer applications, cost factors also play a part with regard to the computing power available. In addition, Doppler artifacts result from the variable delay values of a moving audio source. Wave field synthesis is dependent on the computing expenditure, which in turn depends on the number of virtual audio sources, the number of rendering channels, the source movements, the filtering methods, the delay interpolation methods, etc.
As far as signal processing of Ambisonic Surround signals is concerned, Jerome Daniel, “Further Study of Sound Field Coding with Higher Order Ambisonics”, presented at the AES 116th Convention, Berlin 2004, provides a good overview. An assessment of the quality of sound field reproduction by Ambisonic may be found in Martin Dewhirst, Slawomir Zielinski, Philip Jackson, Francis Rumsey: “Objective Assessment of Spatial Localisation Attributes of Surround-Sound Reproduction Systems”, presented at the AES118th Convention, Barcelona 2005. Alois Sontacchi, Robert Höldrich, “Further Investigations on 3D Sound Fields using distance coding”, presented at the Proceedings of the COST G-6 Conference on Digital Audio Effects, Limerick 2001, address the storage of spatial audio signals. WO 2005/015954 A2 and WO 02/08506 B deal with Ambisonic signals and describe spatial encoding with associated signal processing.