1. Field of the Invention
The present invention relates to wave field synthesis systems and, in particular, to the reduction or elimination of level artifacts in wave field synthesis systems.
2. Description of Prior Art
There is an increasing demand for new technologies and innovative products in the field of entertainment electronics. Thus, it is an important prerequisite for the success of new multimedia systems to offer optimal functionalities and/or abilities. This is achieved by employing digital technologies and, in particular, computer technology. Examples of this are applications offering an improved realistic audio-visual impression. In prior audio systems, an essential weakness is the quality of spatial sound reproduction of natural, but also virtual surroundings.
Methods for a multi-channel loudspeaker reproduction of audio signals have been known for several years and are standardized. All conventional technologies are of disadvantage in that both the location where the loudspeaker is positioned and the position of the listener are already impressed on the transfer format. With a wrong arrangement of the loudspeakers relative to the listener, audio quality suffers considerably. An optimal sound will only be possible in a small region of the reproduction space, the so-called sweet spot.
An improved natural spatial impression and a stronger enclosure in audio reproduction can be obtained using a new technology. The basis of this technology, the so-called wave field synthesis (WFS), was first researched at the Technical University of Delft and first presented in the late 1980ies (A. J. Berkhout; D. de Vries; P. Vogel: Acoustic control by Wave field Synthesis. JASA 93, 1993).
As a consequence of the enormous requirements of this method on computer performance and transfer rates, wave field synthesis has only rarely been employed in practice. Only the progress in the fields of microprocessor technology and audio coding allow this technology to be employed in real applications. First products in the professional area are expected for next year. It is also expected that first wave field synthesis applications for the consumer area will be launched on the market within the next few years.
The basic idea of WFS is based on applying Huygens' Principle of Wave Theory:
Every point detected by a wave is the starting point of an elementary wave propagating in a spherical of circular form.
Applied to acoustics, any form of an incoming wave front can be imitated by a large number of loudspeakers arranged next to one another (a so-called loudspeaker array). In the simplest case of a single point source to be reproduced and a linear arrangement of loudspeakers, the audio signal of every loudspeaker have to be fed with a temporal delay and amplitude scaling so that the sound fields emitted of the individual loudspeakers are superimposed onto one another correctly. With several sound sources, the contribution to every loudspeaker is calculated separately for every source and the resulting signals are added. In a room having reflecting walls, reflections may also be reproduced as additional sources via the loudspeaker array. The complexity in calculation thus strongly depends on the number of sound sources, the reflection characteristics of the recording space and the number of loudspeakers.
The advantage of this technology in particular is that a natural spatial sound impression is possible over a large region of the reproduction space. In contrast to well-know techniques, the direction and distance of sound sources are reproduced precisely. Virtual sound sources may, to a limited extent, even be positioned between the real loudspeaker array and the listener.
Although wave field synthesis functions well for surroundings the qualities of which are known, irregularities may nevertheless occur when the qualities change or when the wave field synthesis is performed on the basis of an environmental quality not matching the actual quality of the environment.
The wave field synthesis technique, however, may also be employed advantageously to supplement visual perception by a corresponding spatial audio perception. Up to now, obtaining an authentic visual impression of the virtual scene has been given special emphasis in production in virtual studios. The acoustic impression pertaining to the picture is usually impressed subsequently onto the audio signal in the so-called post-production by manual steps or classified as being too complicated and time-intense in its realization and thus neglected. Consequently, the result usually is a contradiction of the individual sensational perceptions resulting in the designed space, i.e. the designed scene, to be perceived as being less authentic.
In the specialist publication “Subjective experiments on the effects of combining spatialized audio and 2D video projection in audio-visual systems”, W. de Bruijn and M. Boone, AES convention paper 5582, 10th to 13th May, 2002, Munich, subjective experiments are discussed with regard to the effects of combining spatial audio and a two-dimensional video projection in audio-visual systems. In particular, it is emphasized that two speakers, who are nearly positioned one behind the other, in different distances to a camera can be understood better by an observer when the two persons positioned one behind the other are detected and reconstructed as different virtual sound sources using wave field synthesis. In this case, it has been found out by means of subjective tests that a listener can better understand and differentiate between the two simultaneously speaking speakers when separated.
In a contribution to the conference for the 46th international scientific colloquium in Ilmenau from 24th to 27th Sep., 2001, entitled “Automatisierte Anpassung der Akustik an virtuelle Räume”, U. Reiter, F. Melchior and C. Seidel, an approach of automating sound post-processing processes is presented. Here, the parameters of a film set, such as, for example, spatial size, texture of the surfaces or camera position and position of the actors, required for visualization, are checked as to their acoustic relevance, whereupon corresponding control data is generated. Then, this data automatedly influences the effect and post-processing processes used for post-production, such as, for example, adjusting the dependence of the speakers' volume on the distance to the camera or reverberation time in dependence on spatial size and wall quality. Here, the object is to boost the visual impression of a virtual scene for an increased reality sensation.
“Listening with the ears of the camera” is to be made possible to render a scene more real. Here, the highest possible correlation between a sound event position in the picture and a listening event position in the surround field is aimed at. This means that sound source positions should continuously be adjusted to a picture. Camera parameters, such as, for example, the zoom, are to be considered when designing the sound, as well as a position of two loudspeakers L and R. For this, tracking data of a virtual studio are written to a file by the system, together with a pertaining time code. At the same time, picture, sound and time code are recorded by magnetic tape recording. The camdump file is transmitted to a computer generating control data for an audio workstation from it and outputting it via an MIDI interface synchronously with the picture from the magnetic tape recording. The actual audio processing, such as, for example, positioning of the sound source in the surround field and inserting prior reflections and reverberation, takes place within the audio workstation. The signal is prepared for a 5.1 surround loudspeaker system.
Camera tracking parameters and positions of sound sources in the recording setting may be recorded with real film sets. Data of this kind may also be generated in virtual studios.
In a virtual studio, an actor or presenter is alone in a recording room. In particular, he or she stands in front of a blue wall which is also referred to as blue box or blue panel. A pattern of blue and light blue stripes is applied to this blue wall. The peculiarity about this pattern is that the stripes have different widths and thus give a plurality of stripe combinations. Due to the unique stripe combinations on the blue wall, it is possible in post-processing to determine precisely in which direction the camera is directed when the blue wall is replaced by a virtual background. Using this information, the computer can find out the background for the current angle of view of the camera. Additionally, sensors detecting and outputting additional camera parameters are evaluated in the camera. Typical parameters of a camera, detected by means of sensor technology, are the three translation degrees x, y, z, the three rotation degrees, which are also referred to as roll, tilt, pan, and the focal length or zoom equivalent to the information on the opening angle of the camera.
In order for the precise position of the camera to be determined without picture recognition and without complicated sensor technology, a tracking system consisting of several infrared cameras determining the position of an infrared sensor mounted to the camera can be used. Thus, the position of the camera is also determined. Using the camera parameters provided by the sensoric technology and the stripe information evaluated by the picture recognition, a real-time computer can calculate the background for the current picture. Subsequently, the blue color which the background had is removed from the picture so that the virtual background is introduced instead of the blue background.
In most cases, a concept about obtaining an acoustic general impression of the visually pictured setting is aimed at. This may well be described by the term “full shot” coming from picture design. This “full shot” sound impression most often remains constant for all settings of a scene although the optical angle of view on the objects mostly changes significantly. In this way, optical details are emphasized or put into the background by corresponding adjustments. Even counter-shots in the cinematic design of dialogs are not traced by the sound.
Thus, there is the demand to acoustically embed the audience into an audio-visual scene. Here, the screen or picture area forms the line of vision and the angle of view of the audience. This means that the sound is to follow the picture in the form that it always matches the picture viewed. This is particularly even more important for virtual studios since there is typically no correlation between the sound of the presentation, for example, and the surroundings where the presenter is at that moment. In order to obtain an audio-visual general impression of the scene, a spatial impression matching the rendered picture must be simulated. An essential subjective feature in such a sound concept in this context is the position of the sound source as an observer of, for example, a cinema screen perceives same.
In the audio range, a good spatial sound can be achieved for a great listener range by means of the technique of wave field synthesis (WFS). As has been explained, the wave field synthesis is based on Huygens' Principle according to which wave fronts may be formed and set up by means of superposition of elementary waves. According to a mathematical exact theoretical description, an infinite number of sources in an infinitely small distance would have to be employed in order to generate the elementary waves. In practice, however, a finite number of loudspeakers in a finitely small distance to one another are used. Each of these loudspeakers is controlled, according to the WFS principle, by an audio signal from a virtual source having a certain delay and a certain level. Levels and delays are usually different for all loudspeakers.
As has already been explained, the wave field synthesis system operates on the basis of Huygens' Principle and reconstructs a given wave form of, for example, a virtual source arranged in a certain distance to a show or presentation region or a listener in the presentation region, by a plurality of individual waves. The wave field synthesis algorithm thus receives information on the actual position of an individual loudspeaker from the loudspeaker array to subsequently calculate, for this individual loudspeaker, a component signal this loudspeaker must emit in the end in order for a superposition of the loudspeaker signal from the one loudspeaker on the loudspeaker signals of the other active loudspeakers, for the listener, to perform a reconstruction in that the listener has the impression that he or she is not “irradiated acoustically” by many individual loudspeakers, but only by a single loudspeaker at the position of the virtual source.
For several virtual sources in a wave field synthesis setting, the contribution of each virtual source for each loudspeaker, i.e. the component signal of the first virtual source for the first loudspeaker, of the second virtual source for the first loudspeaker, etc., is calculated to subsequently add the component signals to finally obtain the actual loudspeaker signal. In the case of, for example, three virtual sources, the superposition of the loudspeaker signals of all the active loudspeakers for the listener will result in the listener not having the impression that he or she is irradiated acoustically by a large array of loudspeakers but that the sound he or she hears only comes from three sound sources positioned at special positions which are equivalent to the virtual sources.
The calculation of the component signals in practice is usually performed by the audio signal associated to a virtual source, depending on the position of the virtual source and the position of the loudspeaker at a certain point in time, being provided with a delay and a scaling factor to obtain a delayed and/or scaled audio signal of the virtual source directly representing the loudspeaker signal when only one virtual source is present, or, after being added to further component signals for the respective loudspeaker from other virtual sources, contributing to the loudspeaker signal for the respective loudspeaker.
Typical wave field synthesis algorithms operate independently of how many loudspeakers there are in the loudspeaker array. The theory on which the wave field synthesis is based is that any acoustic field may be reconstructed exactly by an infinitely high number of individual loudspeakers, wherein these individual loudspeakers are arranged infinitely close to one another. In practice, however, neither the infinitely high number nor the infinitely close arrangement can be realized. Instead, there is a limited number of loudspeakers which are additionally arranged in certain predetermined distances from one another. The consequence is that in real systems only an approximation to the actual wave-form can be obtained, which would result if the virtual source were really present, i.e. were a real source.
Additionally, there are different settings in that the loudspeaker array is, when a cinema hall is considered, arranged at, for example, the side of the cinema screen. In this case, the wave field synthesis module would generate loudspeaker signals for these loudspeakers, wherein the loudspeaker signals for this loudspeakers will normally be the same ones as for corresponding loudspeakers in a loudspeaker array not only extending over the side of a cinema, for example, where the screen is arranged but also to the left and right of and behind the audience space. This “360°” loudspeaker array will, of course, provide a better approximation to an exact wave field than only a one-side array, such as, for example, in front of the audience. Nevertheless, the loudspeaker signals for the loudspeakers arranged in front of the audience are the same in both cases. This means that a wave field synthesis module typically does not obtain feedback as to how many loudspeakers there are or whether a one-side or multi-side array or even a 360° array is present or not. Expressed differently, wave field synthesis means calculates a loudspeaker signal for a loudspeaker from the position of the loudspeaker and independently of which other loudspeakers there are or not.
This is an essential strength of the wave field synthesis algorithm in that it may optimally be adapted modularly to different conditions by simply indicating the coordinates of the loudspeakers present in totally different presentation spaces. It is, however, of disadvantage that considerable level artifacts result apart from the poorer reconstruction of the current wave field, which may under certain conditions be accepted. It is not only decisive for a real impression in which direction the virtual source relative to the listener is, but also how loud the listener can hear the virtual source, i.e. which level “reaches” the listener due to a special virtual source. The level reaching a listener, related to a virtual source considered, results from superpositioning the individual signals of the loudspeakers.
If, for example, the case is considered where a loudspeaker array of 50 loudspeakers is in front of the listener and the audio signal of the virtual source is mapped to component signals for the 50 loudspeakers by the wave field synthesis means such that the audio signal is simultaneously emitted by the 50 loudspeakers with different delay and different scaling, a listener of the virtual source will perceive a level of the source resulting from the individual levels of the component signals of the virtual source in the individual loudspeaker signals.
When this wave field synthesis means is used for a reduced array where there are, for example, only 10 loudspeakers in front of the listener, it will be understandable that the level of the signal from the virtual source, resulting at the ear of the listener, has decreased since in a way 40 component signals of the now missing loudspeakers are “missing”.
There may also be the alternative case in which there are, for example, at first loudspeakers to the left and right of the listener which are controlled in phase opposition in a certain constellation such that the loudspeaker signal of two opposite loudspeakers neutralize each other due to a certain delay calculated by the wave field synthesis means. If the loudspeakers at one side of the listener are, for example, omitted in a reduced system, the virtual source will suddenly appear to be louder than it should really be.
Whereas constant factors may be considered for stationary sources for level correction, this solution is no longer acceptable when the virtual sources are not stationary but move. It is an essential feature of wave field synthesis that it can also and in particular process moving virtual sources. A correction having a constant factor would not suffice here since the constant factor would be correct for one position, but would have an artifact-increasing effect for another position of the virtual source.
In addition, wave field synthesis means are able to imitate several different kinds of sources. A prominent form of a source is the point source where the level decreases proportionally by 1/r, r being the distance between a listener and the position of the virtual source. Another form of a source is a source emitting plane waves. Here, the level remains constant independently of the distance to the listener, since plane waves may be generated by point sources arranged in an infinite distance.
According to the wave field synthesis theory, in two-dimensional loudspeaker arrangements the level change depending on r, except for a negligible error, matches the natural level change. Depending on the position of the source, different, sometimes considerable errors in the absolute level may result, which result from employing a finite number of loudspeakers instead of the theoretically required infinite number of loudspeakers, as has been explained above.