Typical sonication systems for supplying a relatively large environment, such as in a conference room on the one hand, or a concert stage in a hall or even in the open air, on the other hand, all have the problem that a real-location reproduction of the sound sources has to be ruled out anyway because of the small number of speaker channels commonly used. But even if a left channel and a right channel are used in addition to the monochannel, the problem concerning the level still remains. For example, the back seats, i.e. the seats far remote from the stage, must obviously be supplied with sound just the same as the seats close to the stage. If, for example, speakers are arranged only at the front in the auditorium or at the sides of the auditorium, an inherent problem will be that persons sitting close to the speaker will perceive the speaker as excessively loud so that the persons at the very back will still be able to hear. In other words, due to the fact that individual supply speakers are perceived as point sources in such a sonication scenario, there will be persons who will claim that the sound is too loud, whereas the other persons will say that it is not loud enough. The persons for whom it is usually too loud will be those persons sitting very close to the point source-like speakers, whereas those persons for whom it is not loud enough will be seated far remote from the speakers.
To avoid this problem at least to some extent, an attempt has therefore been made to locate the speakers higher up, i.e. above the persons sitting close to the speakers, so that at least they will not be fully exposed to the full sound, but so that a considerable amount of the sound of the speaker will propagate above the heads of the audience and will therefore not be perceived by the members of the audience at the front, on the one hand, and will still provide a sufficient level for the members of the audience further at the back. In addition, this problem is met by linear array technology.
Other possibilities consist in running on low level so as not to put too much strain on the persons in the front rows, i.e. close to the speakers, so that there will then obviously be a risk that the sound again will not be loud enough further at the back in the room.
With regard to the directional perception, the whole issue is even more problematic. For example, a single monospeaker, for example in a conference room, will not enable directional perception. It will enable directional perception only if the location of the speaker corresponds to the direction. This is inherently due to the fact that there is only one single speaker channel. However, even if there are two stereo channels, one can, at the most, fade over, or cross-fade, between the left and right channels, i.e. one may conduct panning, as it were. This may be advantageous if there is only one single source. However, if there are several sources, the localization, as it is possible with two stereo channels, will only be roughly possible within a small area of the auditorium. Even though there is a directional perception even with stereo, this will only be the case in the sweet spot. With several sources, this directional impression will become more and more blurred, in particular as the number of sources increases.
In other scenarios, in such medium-sized to large auditoriums supplied with a mix of stereo or mono, the speakers are located above the audience, so that they will not be able to reproduce any directional information of the source anyway.
Even though the sound source, i.e., for example, a person speaking or a theatre actor, is on stage, he/she will be perceived from the speakers which are arranged laterally or centrally. In this context, natural directional perception has been dispensed with. One is already satisfied when the sound is sufficiently loud for the audience at the back and is not unbearably loud for the audience at the front.
In specific scenarios, so-called “support speakers” are also employed which are positioned in the vicinity of a sound source. In this manner, one tries to restore natural position finding on the part of the hearing sense. These support speakers are normally triggered without delay, while stereo sonication via the supply speakers is delayed, so that the support speaker is perceived first, and localization is made possible in accordance with the law of the first wave front. However, even support speakers exhibit the problem that they are perceived as a point source. On the one hand, this leads to there being a deviation from the actual position of the sound emitter, and also to there being a risk that for the audience at the front the sound will be all too loud again, whereas for the audience at the back, the sound will all be too low.
On the other hand, support speakers will enable real directional perception only if the sound source, i.e. for example a person speaking, is located in the immediate vicinity of the support speaker. This would work if a support speaker was built into the lectern and if a person speaking was standing at the lectern, and if in this reproduction space it was out of the question that anybody ever stood next to the lectern while performing for the audience.
With a positional deviation between the support speaker and the sound source, there will be an angular misalignment in the listener's directional perception which adds to the unease felt by members of the audience who might not be used to support speakers but are used to stereo reproduction. One has found that in particular when working with the law of the first wave front and when using a support speaker, it is better to deactivate the support speaker when the real sound source, i.e. the persons speaking, has moved too far away from the support speaker, for example. In other words, this issue is related to the problem that the support speaker cannot be moved, so that—in order not to create the above-mentioned unease among the audience—the support speaker is fully deactivated if the person speaking has moved too far away from the support speaker.
As has already been explained, support speakers employed are usually conventional speakers which in turn exhibit the acoustic properties of a point source—just like the supply speakers—which results in a level which is excessive in the immediate vicinity of the systems and is often perceived as unpleasant.
Generally, there is thus the goal of providing auditory perception of source positions for sonication scenarios as take place in the field of theatre/acting, the intention being that common normal sonication systems which are merely designed to adequately supply the entire auditorium with loudness be supplemented by directional speaker systems and their control.
Typically, medium-sized to large auditoriums are supplied with stereo or mono and, in some cases, with 5.1 surround technology. Typically, the speakers are located next to or above the members of the audience and are able to reproduce correct directional information of the sources for a small part of the audience only. Most members of the audience will get a wrong directional impression.
In addition, however, there are also delta stereophony systems (DSS) which generate directional reference in accordance with the law of the first sound wave front. DD 242954 A3 discloses a large-capacity sonication system for relatively large rooms and areas where the action or performance room and the reception or audience room are directly adjacent or are one and the same. Sonication is conducted in accordance with run-time principles. In particular, any misalignments and jump effects occurring with movements which represent a disturbance particularly in the case of important soloistic sound sources are avoided in that run-time staggering without any limited source areas is realized, and in that the sound power of the sources is taken into account. A control device connected to the delay or amplification means will control them by analogy with the sound paths between the source and acoustic-radiator locations. To this end, a position of a source is measured and used for adjusting speakers accordingly in terms of amplification and delay. A reproduction scenario includes several delimited speaker groups which are triggered respectively.
Delta stereophony results in that one or several directional speakers are located in the vicinity of the real sound source (e.g. on a stage), said directional speakers realizing a position finding reference in large parts of the area of the audience. An approximately natural directional perception is possible. These speakers are triggered after the directional speaker so as to realize the positional reference. In this way, the directional speaker will be perceived first, and thus, localization becomes possible, this connection also being referred to as the “law of the first wave front”.
The support speakers are perceived as point sources. What results is a deviation from the actual position of the sound emitter, i.e. of the original source, if, e.g., a soloist is positioned at a distance from the support speaker rather than being directly in front of or next to the support speaker.
Therefore, if a sound source moves between two support speakers, one must fade over between such differently arranged support speakers. This relates both to the level and to time. By contrast, by means of wave-field synthesis systems, a real directional reference may be achieved via virtual sound sources.
In order to further understanding of the present invention, wave-field synthesis technology shall be explained below in more detail.
An improved natural spatial impression as well as enhanced enclosure in audio reproduction may be achieved using a new technology. The basics of this technology, the so-called wave-field synthesis (WFS), were researched at the technical university of Delft and introduced for the first time in the late eighties (Berkhout, A. J.; de Vries, D.; Vogel, P.: Acoustic control by Wave-field Synthesis. JASA 93, 1993).
Due to the enormous requirements placed upon computer power and transfer rates by this method, it is rare that wave-field synthesis has been applied in practice so far. It is the very progress made in the fields of microprocessor technology and audio encoding that nowadays allows this technology to be employed in specific applications. The first products in the professional field are expected to be introduced this year. In a few years' time, the first wave-field synthesis applications for the consumer domain are to enter the market.
The fundamental idea of WFS is based on the application of Huygens' principle of wave theory:
Each point at which a wave arrives is a starting point of an elementary wave which propagates as a spherical shape or as a circular shape.
In terms of acoustics, any shape of an incoming wave front may be replicated by a large number of speakers arranged next to one another (a so-called speaker array). In the simplest case of a single point source to be reproduced and a linear array of the speakers, the audio signals of each speaker must be fed with a time delay and an amplitude scaling in such a manner that the emitted sound fields of the individual speakers will superimpose correctly. In the case of several sound sources, for each source the contribution to each speaker is calculated separately, and the resulting signals are added. If the sources to be reproduced are located in a room having reflecting walls, reflections must also be reproduced via the speaker array as additional sources. The expenditure in calculation therefore highly depends on the number of sound sources, the reflection properties of the recording room, and the number of speakers.
The advantage of this technology is, in particular, that a natural spatial sound impression is possible across a large area of the reproduction room. Unlike the known technologies, the direction and distance of sound sources are reproduced in a highly precise manner. To a limited extent, virtual sound sources may even be positioned between the real speaker array and the listener.
Even though wave-field synthesis works well for environments the conditions of which are known, there will still be irregularities if the condition changes or if wave-field synthesis is performed on the basis of an environmental condition which does not match the actual condition of the environment.
An environmental condition may be described by the pulse response of the environment.
This will be set forth in more detail using the following example. One assumes that a speaker emits a sound signal toward a wall whose reflection is undesired. For this simple example, spatial compensation using wave-field synthesis would consist in that initially, the reflection of this wall is determined in order to ascertain the time when a sound signal that has been reflected by the wall arrives back at the speaker, and to ascertain the amplitude of the reflected sound signal. If the reflection from this wall is undesired, wave-field synthesis offers the possibility of eliminating the reflection from this wall in that a signal which is in phase opposition to the reflection signal and has a corresponding amplitude is impressed on the speaker in addition to the original audio signal, so that the forward compensation wave will extinguish the reflection wave such that the reflection from this wall is eliminated in the environment under consideration. This may be effected in that initially the pulse response of the environment is calculated, and that the condition and position of the wall are determined on the basis of the pulse response of this environment, the wall being interpreted as an image source, i.e. as a sound source reflecting an incoming sound.
If the pulse response of this environment is initially measured, and if the compensation signal which must be impressed on the speaker in a condition where it is superimposed on the audio signal is subsequently calculated, there will be a cancellation of the reflection from this wall, such that a listener in this environment will have the impression, in terms of sound, that this wall does not exist at all.
However, what is decisive for optimum compensation of the reflected wave is that the pulse response of the room is accurately determined so that no over- or undercompensation occurs.
Wave-field synthesis thus enables correct imaging of virtual sound sources across a large reproduction range. At the same time, it offers the sound mixer and the sound engineer a new technical and creative potential in creating even complex sound scenarios. Wave-field synthesis (WFS, or sound-field synthesis) as was developed at the technical university of Delft at the end of the eighties, represents a holographic approach of sound reproduction. The basis for this is the Kirchhoff-Helmholtz integral. It states that any sound fields may be generated within a closed volume by means of distributing monopole and dipole sound sources (speaker arrays) on the surface of this volume. For details, please see M. M. Boone, E. N. G. Verheijen, P. F. v. Tol, “Spatial Sound-Field Reproduction by Wave-Field Synthesis”, Delft University of Technology Laboratory of Seismics and Acoustics, Journal of J. Audio Eng. Soc., vol. 43, No. 12, December 1995 and Diemer de Vries, “Sound Reinforcement by wave-field synthesis: Adaption of the Synthesis Operator to the Loudspeaker Directivity Characteristics”, Delft University of Technology Laboratory of Seismics and Acoustics, Journal of J. Audio Eng. Soc., vol. 44, No. 12, December 1996.
In wave-field synthesis, a synthesis signal is calculated for each speaker of the speaker array from an audio signal which emits a virtual source at a virtual position, the synthesis signals being configured, with regard to amplitude and phase, such that a wave which results from the superposition of the individual sound wave emitted by the speakers existing in the speaker array corresponds to the wave that would be caused by the virtual source at the virtual position if this virtual source at the virtual position were a real source having a real position.
Typically, several virtual sources exist at different virtual positions. Calculation of the synthesis signals is performed for each virtual source at each virtual position, so that typically, a virtual source results in synthesis signals for several speakers. From the point of view of a speaker, this speaker thus receives several synthesis signals going back to different virtual sources. A superposition of these sources, which is possible due to the linear superposition principle, will then result in the reproduction signal actually emitted by the speaker.
The possibilities of wave-field synthesis may be exploited the better, the more closed the speaker arrays are, i.e. the more individual speakers can be positioned as close to one another as possible. However, as a consequence, the computing performance that a wave-field synthesis unit must achieve also increases, since typically channel information must also be taken into account. In particular, this means that in principle, a dedicated transfer channel is present from each virtual source to each speaker, and that in principle, it may be the case that each virtual source results in a synthesis signal for each speaker, or that each speaker receives a number of synthesis signals which equals the number of virtual sources.
In addition, it shall be noted at this point that the quality of the audio reproduction increases as the number of speakers made available increases. This means that the quality of the audio reproduction becomes better and more realistic as the number of speakers that are present in the speaker array(s) increases.
In the above scenario, the reproduction signals, which have been completely rendered and converted from analog to digital, for the individual speakers may be transferred, for example via two-wire lines, from the wave-field synthesis central unit to the individual speakers. Admittedly, this would have the advantage of almost ensuring that all speakers work synchronously, so that in this case, no further measures would be necessary for synchronization purposes. On the other hand, the wave-field synthesis central unit could only be produced, in each case, for a specific reproduction room, or for reproduction using a specific number of speakers. This means that for each reproduction room, a dedicated wave-field synthesis central unit would have to be produced that has to achieve a considerable amount of computing performance, since calculation of the audio reproduction signals must be effected at least partly in parallel and in real time, particularly with regard to a large number of speakers or a large number of virtual sources.
Delta stereophony is problematic in particular since positional artefacts will occur due to phase and level errors during fade-over between different sound sources. In addition, phase errors and mislocalization will occur in the case of different rates of movement of the sources. Moreover, fade-over from one support speaker to another support speaker is associated with a very large expenditure in terms or programming, there also being problems of keeping an overview of the entire audio scene, in particular when several sources are faded in and out by different support speakers, and when, in particular, there is a large number of support speakers which may be triggered differently.
In addition, wave-field synthesis, on the other hand, and delta stereophony, on the other hand, are actually opposite methods, while both systems may have advantages in different applications, however.
For example, delta stereophony is considerably less expensive in terms of calculating the speaker signals than is wave-field synthesis. On the other hand, working with wave-field synthesis may create no artefacts. However, because of the space requirement and the requirement placed upon an array having closely spaced speakers, wave-field synthesis arrays cannot be employed everywhere. In particular in the field of stage technique, it is very problematic to position a speaker band or a speaker array on stage, since it is difficult to hide such speaker arrays, and since they will therefore be visible and negatively affect the visual impression of the stage. This is problematic, in particular, when—as it usually is the case in theater/musical performances—the visual impression of a stage has priority over all other issues, and in particular over the sound or sound production. On the other hand, no fixed grid of support speakers is predefined by wave-field synthesis, but there may be continuous movement of a virtual source. A support speaker, however, cannot move. However, the movement of a support speaker may be created virtually by directional fade over.
Limitations of delta stereophony thus consist in that, in particular, the number of possible support speakers accommodated on a stage is limited for reasons of expenditure (depending on the stage setting) and for reasons of sound management. In addition, each support speaker necessitates, if it is to work in accordance with the principle of the first wave front, further speakers which create the necessary loudness. This is the very advantage of delta stereophony, mainly that a relatively small speaker, which is consequently easy to accommodate, is sufficient for localization generation, whereas a large number of further speakers located in the vicinity serve to create the necessary loudness for the member of the audience who, in a relatively auditorium, may actually be seated quite far at the back.
Therefore, all speakers on the stage may be associated with different directional zones, each directional zone having a localization speaker (or a small group of localization speakers triggered at the same time) triggered without any or with only a small delay, while the other speakers of the directional group are triggered with the same signal, but with a time delay, so as to generate the necessary loudness, while the localization speaker would have supplied the specifically designed localization.
Since sufficient loudness is needed, the number of speakers in a directional group may not be reduced to any value desired. On the other hand, one would like to have a very large number of directional zones to at least aim at a continuous supply of sound. Due to the fact that in addition to the localization speaker, each directional zone also necessitates a sufficient number of speakers to generate sufficient loudness, the number of directional zones is limited when a stage area is divided up into mutually adjacent, non-overlapping directional zones, each directional zone having a localization speaker or a small group of closely spaced adjacent localization speakers associated with it.
Typical delta stereophony concepts are based on that fade-over is performed between two locations if a source is to move from one location to another location. This concept is problematic when, for example, a manual intervention is to be performed in a programmed setup, or when an error correction is to occur. For example, if it turns out that a singer does not stick to the agreed route across the stage, but moves differently, there will be an increasing deviation between the perceived position and the actual position of the singer, which evidently is not desirable.
If for such a case a possibility of corrective intervention is desired, a user could input, for correction purposes, that the audio position is to correspond, at a specific point in time or directly, with the actual position of the singer on stage. However, this would result in a hard source jump which might possibly lead to even larger artefacts than the mismatch between the audio source and the audio source perceived.
In order to avoid such a jump, one might complete the fade-over process one has already started so as to then correct the target of the next fade-over process starting from a position within a directional zone, i.e. after a complete fade-over process. This would ensure that not hard jumps will occur. What is disadvantageous about this concept, however, is that there is no possibility of intervening during a fade-over process. Thus, a considerable delay will result, particularly when a relatively long fade-over process is ongoing, namely, for example, from a source on the very left of the stage to a source on the very right of a large stage. This results in that there is a relatively long time interval where the perceived position of the audio source deviates from the actual one. In addition, the actual position, which might already be moving again, must obviously be caught up with, which may only be accomplished by a relatively fast passage of a source across the stage to the position sought. This very fast passage may, in turn, lead to artefacts, or at least result in that a user asks himself/herself why the audio position perceived is moving so much even though the singer himself/herself has not moved or has moved only very little.