1. Field of the Invention
The present invention relates to the wave field synthesis technique, and particularly to tools for creating audio scene descriptions and/or for verifying audio scene descriptions.
2. Description of the Related Art
There is an increasing need for new technologies and innovative products in the area of entertainment electronics. It is an important prerequisite for the success of new multimedia systems to offer optimal functionalities or capabilities. This is achieved by the employment of digital technologies and, in particular, computer technology. Examples for this are the applications offering an enhanced close-to-reality audiovisual impression. In previous audio systems, a substantial disadvantage lies in the quality of the spatial sound reproduction of natural, but also of virtual environments.
Methods of multi-channel loudspeaker reproduction of audio signals have been known and standardized for many years. All usual techniques have the disadvantage that both the site of the loudspeakers and the position of the listener are already impressed on the transmission format. With wrong arrangement of the loudspeakers with reference to the listener, the audio quality suffers significantly. Optimal sound is only possible in a small area of the reproduction space, the so-called sweet spot.
A better natural spatial impression as well as greater enclosure or envelope in the audio reproduction may be achieved with the aid of a new technology. The principles of this technology, the so-called wave field synthesis (WFS), have been studied at the TU Delft and first presented in the late 80s (Berkout, A. J.; de Vries, D.; Vogel, P.: Acoustic control by Wave field Synthesis. JASA 93, 1993).
Due to this method's enormous demands on computer power and transfer rates, the wave field synthesis has up to now only rarely been employed in practice. Only the progress in the area of the microprocessor technology and the audio encoding do permit the employment of this technology in concrete applications today. First products in the professional area are expected next year. In a few years, first wave field synthesis applications for the consumer area are also supposed to come on the market.
The basic idea of WFS is based on the application of Huygens' principle of the wave theory:
Each point caught by a wave is starting point of an elementary wave propagating in spherical or circular manner.
Applied on acoustics, every arbitrary shape of an incoming wave front may be replicated by a large amount of loudspeakers arranged next to each other (a so-called loudspeaker array). In the simplest case, a single point source to be reproduced and a linear arrangement of the loudspeakers, the audio signals of each loudspeaker have to be fed with a time delay and amplitude scaling so that the radiating sound fields of the individual loudspeakers overlay correctly. With several sound sources, for each source the contribution to each loudspeaker is calculated separately and the resulting signals are added. If the sources to be reproduced are in a room with reflecting walls, reflections also have to be reproduced via the loudspeaker array as additional sources. Thus, the expenditure in the calculation strongly depends on the number of sound sources, the reflection properties of the recording room, and the number of loudspeakers.
In particular, the advantage of this technique is that a natural spatial sound impression across a great area of the reproduction space is possible. In contrast to the known techniques, direction and distance of sound sources are reproduced in a very exact manner. To a limited degree, virtual sound sources may even be positioned between the real loudspeaker array and the listener.
Although the wave field synthesis functions well for environments the properties of which are known, irregularities occur if the property changes or the wave field synthesis is executed on the basis of an environment property not matching the actual property of the environment.
A property of the surrounding may also be described by the impulse response of the surrounding.
This will be set forth in greater detail on the basis of the subsequent example. It is assumed that a loudspeaker sends out a sound signal against a wall, the reflection of which is undesired. For this simple example, the space compensation using the wave field synthesis would consist in the fact that at first the reflection of this wall is determined in order to ascertain when a sound signal having been reflected from the wall again arrives the loudspeaker, and which amplitude this reflected sound signal has. If the reflection from this wall is undesirable, there is the possibility, with the wave field synthesis, to eliminate the reflection from this wall by impressing a signal with corresponding amplitude and of opposite phase to the reflection signal on the loudspeaker, so that the propagating compensation wave cancels out the reflection wave, such that the reflection from this wall is eliminated in the surrounding considered. This may be done by at first calculating the impulse response of the surrounding and then determining the property and position of the wall on the basis of the impulse response of this surrounding, wherein the wall is interpreted as a mirror source, i.e. as a sound source reflecting incident sound.
If at first the impulse response of this surrounding is measured and then the compensation signal, which has to be impressed on the loudspeaker in a manner superimposed on the audio signal, is calculated, cancellation of the reflection from this wall will take place, such that a listener in this surrounding has the sound impression that this wall does not exist at all.
However, it is crucial for optimum compensation of the reflected wave that the impulse response of the room is determined accurately so that no over- or undercompensation occurs.
Thus, the wave field synthesis allows for correct mapping of virtual sound sources across a large reproduction area. At the same time it offers, to the sound master and sound engineer, new technical and creative potential in the creation of even complex sound landscapes. The wave field synthesis (WFS, or also sound field synthesis), as developed at the TU Delft at the end of the 80s, represents a holographic approach of the sound reproduction. The Kirchhoff-Helmholtz integral serves as a basis for this. It states that arbitrary sound fields within a closed volume can be generated by means of a distribution of monopole and dipole sound sources (loudspeaker arrays) on the surface of this volume.
In the wave field synthesis, a synthesis signal for each loudspeaker of the loudspeaker array is calculated from an audio signal sending out a virtual source at a virtual position, wherein the synthesis signals are formed with respect to amplitude and phase such that a wave resulting from the superposition of the individual sound wave output by the loudspeakers present in the loudspeaker array corresponds to the wave that would be due to the virtual source at the virtual position if this virtual source at the virtual position were a real source with a real position.
Typically, several virtual sources are present at various virtual positions. The calculation of the synthesis signals is performed for each virtual source at each virtual position, so that typically one virtual source results in synthesis signals for several loudspeakers. As viewed from a loudspeaker, this loudspeaker thus receives several synthesis signals, which go back to various virtual sources. A superposition of these sources, which is possible due to the linear superposition principle, then results in the reproduction signal actually sent out from the loudspeaker.
The possibilities of the wave field synthesis can be utilized the better, the larger the loudspeaker arrays are, i.e. the more individual loudspeakers are provided. With this, however, the computation power the wave field synthesis unit must summon also increases, since channel information typically also has to be taken into account. In detail, this means that, in principle, a transmission channel of its own is present from each virtual source to each loudspeaker, and that, in principle, it may be the case that each virtual source leads to a synthesis signal for each loudspeaker, and/or that each loudspeaker obtains a number of synthesis signals equal to the number of virtual sources.
If the possibilities of the wave field synthesis particularly in movie theater applications are to be utilized in that the virtual sources can also be movable, it can be seen that rather significant computation powers are to be handled due to the calculation of the synthesis signals, the calculation of the channel information and the generation of the reproduction signals through combination of the channel information and the synthesis signals.
Furthermore, it is to be noted at this point that the quality of the audio reproduction increases with the number of loudspeakers made available. This means that the audio reproduction quality becomes the better and more realistic, the more loudspeakers are present in the loudspeaker array(s).
In the above scenario, the completely rendered and analog-digital-converted reproduction signal for the individual loudspeakers could, for example, be transmitted from the wave field synthesis central unit to the individual loudspeakers via two-wire lines. This would indeed have the advantage that it is almost ensured that all loudspeakers work synchronously, so that no further measures would be needed for synchronization purposes here. On the other hand, the wave field synthesis central unit could be produced only for a particular reproduction room or for reproduction with a fixed number of loudspeakers. This means that, for each reproduction room, a wave field synthesis central unit of its own would have to be fabricated, which has to perform a significant measure of computation power, since the computation of the audio reproduction signals must take place at least partially in parallel and in real time, particularly with respect to many loudspeakers and/or many virtual sources.
German patent DE 10254404 B4 discloses a system as illustrated in FIG. 7. One part is the central wave field synthesis module 10. The other part consists of individual loudspeaker modules 12a, 12b, 12c, 12d, 12e, which are connected to actual physical loudspeakers 14a, 14b, 14c, 14d, 14e, such as it is shown in FIGS. 1A-1D. It is to be noted that the number of the loudspeakers 14a-14e lies in the range above 50 and typically even significantly above 100 in typical applications. If a loudspeaker of its own is associated with each loudspeaker, the corresponding number of loudspeaker modules also is needed. Depending on the application, however, it is advantageous to address a small group of adjoining loudspeakers from a loudspeaker module. In this connection, it is arbitrary whether a loudspeaker module connected to four loudspeakers, for example, feeds the four loudspeakers with the same reproduction signal, or corresponding different synthesis signals are calculated for the four loudspeakers, so that such a loudspeaker module actually consists of several individual loudspeaker modules, which are, however, summarized physically in one unit.
Between the wave field synthesis module 10 and every individual loudspeaker 12a-12e, there is a transmission path 16a-16e of its own, with each transmission path being coupled to the central wave field synthesis module and a loudspeaker module of its own.
A serial transmission format providing a high data rate, such as a so-called Firewire transmission format or a USB data format, is advantageous as data transmission mode for transmitting data from the wave field synthesis module to a loudspeaker module. Data transfer rates of more than 100 megabits per second are advantageous.
The data stream transmitted from the wave field synthesis module 10 to a loudspeaker module thus is formatted correspondingly according to the data format chosen in the wave field synthesis module and provided with synchronization information provided in usual serial data formats. This synchronization information is extracted from the data stream by the individual loudspeaker modules and used to synchronize the individual loudspeaker modules with respect to their reproduction, i.e. ultimately to the analog-digital conversion for obtaining the analog loudspeaker signal and the sampling (re-sampling) provided for this purpose. The central wave field synthesis module works as a master, and all loudspeaker modules work as clients, wherein the individual data streams all obtain the same synchronization information from the central module 10 via the various transmission paths 16a-16e. This ensures that all loudspeaker modules work synchronously, namely synchronized with the master 10, which is important for the audio reproduction system so as not to suffer loss of audio quality, so that the synthesis signals calculated by the wave field synthesis module are not irradiated in temporally offset manner from the individual loudspeakers after corresponding audio rendering.
The concept described indeed provides significant flexibility with respect to a wave field synthesis system, which is scalable for various ways of application. But it still suffers from the problem that the central wave field synthesis module, which performs the actual main rendering, i.e. which calculates the individual synthesis signals for the loudspeakers depending on the positions of the virtual sources and depending on the loudspeaker positions, represents a “bottleneck” for the entire system. Although, in this system, the “post-rendering”, i.e. the imposition of the synthesis signals with channel transmission functions, etc., is already performed in decentralized manner, and hence the necessary data transmission capacity between the central renderer module and the individual loudspeaker modules has already been reduced by selection of synthesis signals with less energy than a determined threshold energy, all virtual sources, however, still have to be rendered for all loudspeaker modules in a way, i.e. converted into synthesis signals, wherein the selection takes place only after rendering.
This means that the rendering still determines the overall capacity of the system. If the central rendering unit thus is capable of rendering 32 virtual sources at the same time, for example, i.e. to calculate the synthesis signals for these 32 virtual sources at the same time, serious capacity bottlenecks occur, if more than 32 sources are active at one time in one audio scene. For simple scenes this is sufficient. For more complex scenes, particularly with immersive sound impressions, i.e. for example when it is raining and many rain drops represent individual sources, it is immediately apparent that the capacity with a maximum of 32 sources will no longer suffice. A corresponding situation also exists if there is a large orchestra and it is desired to actually process every orchestral player or at least each instrument group as a source of its own at its own position. Here, 32 virtual sources may very quickly become too less.
Typically, in a known wave field synthesis concept, one uses a scene description in which the individual audio objects are defined together such that, using the data in the scene description and the audio data for the individual virtual sources, the complete scene can be rendered by a renderer or a multi-rendering arrangement. Here, it is exactly defined for each audio object, where the audio object has to begin and where the audio object has to end. Furthermore, for each audio object, the position of the virtual source at which that virtual source is to be, i.e. which is to entered into the wave field synthesis rendering means, is indicated exactly, so that the corresponding synthesis signals are generated for each loudspeaker. This results in the fact that, by superposition of the sound waves output from the individual loudspeakers as a reaction to the synthesis signals, an impression develops for a listener as if a sound source were positioned at a position in the reproduction room or outside the reproduction room, which is defined by the source position of the virtual source.
It is disadvantageous in the concept described that it is relatively rigid particularly in the creation of the audio scene descriptions. Thus, a sound master will create an audio scene exactly for a certain wave field synthesis equipment, from which he or she exactly knows the situation in the reproduction room and creates the audio scene description so that it smoothly runs on the defined wave field synthesis system known to the producer.
In this connection, the sound master will already take maximum capacities of the wave field synthesis rendering means as well as requirements for the wave field in the reproduction room into account in the creation of the audio scene description. For example, if a renderer has a maximum capacity of 32 audio sources to be processed, the sound master will already take care to edit the audio scene description so that there are never more than 32 sources to be processed at the same time.
Moreover, the sound master will already think of the fact that, in the positioning of e.g. two instruments such as bass guitar and lead guitar, for the entire reproduction room, the expansions of which are known to the producer, sound run times are to be met. Thus, for a clear and non-blurred sound image, it is important that e.g. bass guitar and lead guitar be perceived in relatively uniform manner by the listener. A sound master will then take care, in the virtual positioning, i.e. in the association of the virtual positions with these two sources, that the wave fronts from these two instruments arrive at a listener at almost the same time in the entire reproduction room.
An audio scene description thus will contain a series of audio objects, with each audio object including a virtual position and a start time instant, an end time instant or a duration.
Normally, by manual checks, i.e. by test listening at various positions in the reproduction room, it is actually checked if the audio scene description may stay like that, i.e. if the producer of the audio scene description has actually done a good job and has met all requirements of the wave field synthesis system.
It is disadvantageous in this concept that the sound master creating the audio scene description has to concentrate on boundary conditions of the wave field synthesis system, which actually do not concern the creative side of the audio scene. Thus, it would be desirable if the sound master could concentrate on the creative aspects alone, without having to take a certain wave field synthesis system on which an audio scene has to run into account.
It is further disadvantageous in the described concept that, when an audio scene description from a wave field synthesis system with a certain first behavior, for which the audio scene description has been designed, is supposed to run on another wave field synthesis system with a second behavior, for which the audio scene has not been designed.
If one would only have the audio scene description run on the system for which it has not been designed, problems would occur in that audible errors will be introduced if the second system is less powerful than the first system.
If the second system, however, is more powerful than the first system, the audio scene description will, however, only demand the second system within the scope of the performance of the first system and not exhaust the additional performance of the second system.
If the second system further refers to e.g. a larger reproduction room, it can no longer be ensured, at certain places, that the wave fronts of two virtual sources, such as bass guitar and lead guitar, arrive at almost the same time.
Particularly the problem of the concurrent or almost concurrent perception of two virtual sources, which should be synchronous, is very problematic, especially since only manual test listening action and a subjective assessment of the quality at certain places in the reproduction room previously has been possible for this purpose.
In response to such subjective assessments, the sound master then was needed to completely revise the audio scene description actually already finished for the second system, which in turn necessitates both temporal resources and financial resources.
Particularly due to the expectation of a strong expansion of wave field synthesis systems in the next time, the question of the flexible audio scene descriptions that can universally be played on arbitrary systems will come up more and more, in order to achieve similar portability or compatibility at this place some time, as it is known for CDs or DVDs.