1. Field of the Invention
The present invention relates to wave-field synthesis systems and particularly to wave-field synthesis systems allowing moving virtual sources.
2. Description of the Related Art
There is an increasing demand for new technologies and innovative products in the field of consumer electronics. Here, it is an important prerequisite for the success of new multimedia systems to offer optimum functionalities or capabilities, respectively. This is achieved by the usage of digital technologies and particularly computer technology. Examples therefore are applications offering an improved realistic audiovisual impression. In prior art audio systems, a significant weak point is the quality of the spatial sound reproduction of real but also virtual environments.
Methods for multichannel loudspeaker reproduction of audio signals have been known and standardized for many years. All common techniques have the disadvantage that both the location of the loudspeakers and the position of the listener are already imprinted in the transmission format. If the loudspeakers are positioned in a wrong way with regard to the listener, the audio quality suffers significantly. An optimum sound is only possible in a very small area of the reproduction room, the so called sweet spot.
An improved natural spatial impression as well as stronger enclosure during audio reproduction can be obtained with the help of new technology. The basics of this technology, the so called wave-field synthesis (WFS) have been researched at the TU Delft and have been presented for the first time in the late 80ies (Berkhout, A. J.; de Vries, D.; Vogel, P.: Acoustic control by Wave-field Synthesis. JASA 93, 1993).
Due to the huge requirements of this method with regard to computing effort and transmission rates, the wave-field synthesis has so far only rarely been applied in practice. Only the progresses in the field of microprocessor technique and audio encoding allow the usage of this technology in specific applications today. First products in the professional field are expected next year. In a few years, the first wave-field synthesis applications for the consumer field will come on the market.
The basic idea of WFS is based on the application of the Huygens principle of the wave theory.
Every point captured by a wave is the starting point of an elementary wave, which propagates in a spherical or circular way.
Applied to acoustics, any form of an incoming wave front can be reproduced by a large number of loudspeakers arranged next to another (a so called loudspeaker array). In the simplest case, a single point source to be reproduced and a linear arrangement of the loudspeakers, the audio signals of every loudspeaker have to be fed with a time delay and amplitude scaling such that the emitted sound fields of the individual loudspeakers overlay properly. With several sound sources, the contribution to every loudspeaker is calculated separately for every source and the resulting signals are added. In a virtual space with reflecting walls, the reflections can also be reproduced via the loudspeaker array as additional sources. Thus, the calculation effort depends heavily on the number of sound sources, the reflection characteristics of the recording room and the number of loudspeakers.
The particular advantage of this technique is that a natural spatial sound impression is possible across a large area of the reproduction room. In contrary to the known techniques, direction and distance from the sound sources are reproduced very accurately. To a limited degree, virtual sound sources can even be positioned between the real loudspeaker array and the listener.
Although wave-field synthesis functions well for surroundings whose conditions are known, irregularities occur when the conditions change or when wave-field synthesis is performed based on surrounding conditions which do not correspond to the actual condition of the surroundings, respectively.
The technique of wave-field synthesis can also be used advantageously to add a corresponding spatial audio perception to a visual perception. So far, during production in virtual studios, the focus was on the production of an authentic visual impression of the virtual scene. The acoustic impression matching the image is normally imprinted on the audio signal afterwards by manual operating steps in the so-called postproduction or is considered to be too expensive and too time-consuming to realize and is thus neglected. This causes normally a discrepancy between individual sense impressions, which causes the designed space, i.e. the designed scene, to be considered as less authentic.
In the expert publication “Subjective experiments on the effects of combining spatialized audio and 2D video projection in audio-visual systems”, W. de Bruijn and M. Boone, AES convention paper 5582, May 10th to 13th, 2003, Munich, subjective experiments with regard to the effects of combining spatial audio and a two-dimensional video projection in audiovisual systems are presented. Particularly, it is emphasized that two speakers standing at different distances to a camera, who stand almost behind one another, can be understood better by an audience when the two persons standing behind one another can be seen and reconstructed as different virtual sound sources with the help of wave-field synthesis. In that case, it has been found out by subjective tests that a listener can better understand and differentiate the two speakers speaking simultaneously when they are separated.
In a conference contribution for the 46th international academic colloquium in Ilmenau from Sep. 24 to 27, 2001, with the title “Automatisierte Anpassung der Akustik an virtuelle Räume”, U. Reiter, F. Melchior and C. Seidel, an approach for automating sound post-processing processes is presented. Therefore, the parameters of a film set required for the visualization, such as room size, texture of the surfaces or camera position and position of the actors are checked for their acoustic relevance, whereupon corresponding control data are generated. These influence then in an automated way the effect and post-processing processes used for postproduction, such as the adaptation of the speaker volume dependency on the distance to the camera or the reverberation time in dependence on room size and wall conditions. Here, it is the aim to enforce the visual impression of a virtual scene for an increased perception of reality.
It is intended to enable “listening with the ears of the camera” for making a scene appear more real. In this connection, it is intended that a correlation between sound event location in the image and listening event location in the surround field is as high as possible. This means that sound source positions are constantly adapted to an image. Camera parameters, such as zoom, are also to be incorporated in the sound design like a position of two loudspeakers L and R. Therefore, tracking data of a virtual studio are written into a file by the system together with an associated time code. Image, sound and time code are recorded simultaneously on an VTR. The Camdump file is transmitted to a computer, which generates control data for an audio workstation therefrom and outputs them via an MIDI interface synchronously to the image coming from the VTR. The actual audio processing as well as positioning the sound source in the surround field and inserting earlier reflections and reverberation is performed within the audio workstation. The signal is rendered for a 5.1 surround loudspeaker system.
Camera tracking parameters as well as positions of sound sources in the recording setting can be recorded in real film sets. Such data can also be generated in virtual studios.
In a virtual studio, an actor or presenter is alone in a recording room. Particularly, he stands in front of a blue wall, which is also referred to as blue box or blue panel. On this blue wall, a pattern of blue and light-blue stripes is disposed. Special about this design is that the stripes have a different width and thus a plurality of stripe combinations result. During post-processing, when the blue wall is replaced by a virtual background, it is possible to determine exactly which direction the camera looks due to the unique stripe combination on the blue wall. With the help of this information, the computer can determine the background for the current angle of view of the camera. Further, sensors at the camera are evaluated, which detect additional camera parameters and output the same. Typical parameters of a camera, which are detected via sensor technology, are the three translation degrees x, y, z, the three rotation degrees, which are also referred to as roll, tilt, pan, and the focal length or the zoom, respectively, which is equal to the information about the aperture angle of the camera.
In order to be able to determine the exact position of the camera even without image recognition and without expensive sensor technique, the tracking system can also be used, which consists of several infrared cameras, which determine the position of an infrared sensor mounted to the camera. Thereby, the position of the camera is also determined. With the camera parameters provided by the sensor technology and the stripe information evaluated by image recognition, a real time computer can now calculate the background for the current image. Then, the blue hue, which the blue background had, is removed from the image, so that instead of the blue background the virtual background is brought in.
In most cases, a concept is followed, which is based on getting an acoustic overall impression of the visually imaged scene. This can be described with the expression “full shot” coming from image design. This “full shot” sound impression remains mostly constant via all settings in a scene, although the optical angle of view on things often changes very much. Optical details are emphasized by corresponding angles or moved into the background. Countershots in creating dialogs in films are also not reproduced by sounds.
Thus, there is the need to embed the audience acoustically into an audiovisual scene. In this connection, the screen or the image area is the line of vision and the angle of view of the audience. This means that the sound is to follow the image in the form that it always corresponds to the image. This is particularly important for virtual studios since there is typically no correlation between the sound of the moderation, for example and the surroundings where the presenter is at the moment. In order to get an audiovisual overall impression of the scene, a room impression matching the rendered image has to be simulated. In that context, the location of a sound source, as it is perceived by, for example, an audience of a cinema screen, is a significant subjective characteristic in such a sound concept.
In the audio domain, a good spatial sound can be obtained for a large listener area by the technique of wave-field synthesis (WFS). As it has been discussed, the wave-field synthesis is based on the principle of Huygens, according to which wave fronts can be formed and structured by overlaying elementary waves. According to mathematically correct theoretical description, an infinite amount of sources in infinitely small distance would have to be used for generating the elementary waves. Practically, however, a finite amount of loudspeakers are used in a finite small distance to each other. According to the WFS principle, each of these loudspeakers is controlled by an audio signal from a virtual source, which has a certain delay and a certain level. Levels and delays are normally different for all loudspeakers.
In the audio domain exists a so called natural Doppler effect. This Doppler effect occurs from a source sending an audio signal with a certain frequency, a receiver receiving the signal and a movement of the source taking place relative to the receiver. Due to an “extension” or “compression” of the acoustic waveforms, this causes the frequency of the audio signal to change for the receiver according to the movement. Normally, a person is the receiver and hears this frequency change directly, for example when an ambulance with siren moves towards a person and then passes the person. The person will hear the siren at the time when the ambulance is in front of him with a different pitch than when the ambulance is behind him.
A Doppler effect exists also in the wave-field synthesis or sound field synthesis, respectively. It is physically based on the same background as the above-described natural Doppler effect. However, in contrary to the natural Doppler effect, there is no direct path between sender and receiver in sound field synthesis. Instead, a differentiation is made in that there is a primary transmitter and a primary receiver. Above that, a secondary transmitter and a secondary receiver exist. This scenario will be discussed below with reference to FIG. 7.
FIG. 7 shows a virtual source 700, which moves from a first position, which is indicated by an encircled “1” in FIG. 7 over time along a path of movement 702 to a second position, which is indicated in FIG. 7 by an encircled “2”. Further, three loudspeakers 704 are shown schematically, which are to symbolize a wave-field synthesis loudspeaker array. Further, there is a listener 706 in the scenario, which is arranged in the example shown in FIG. 7 such that the path of movement of the virtual source is a circular path, which extends around the listener, who is the center of this circular path. The loudspeakers 704, however, are not disposed in the center, in that at the time when the virtual source 700 is at the first position, the same has a first distance r1 from a loudspeaker and that the source then has a second distance r2 to the source in its second position. In the scenario shown in FIG. 7, r1 is unequal r2, while R1, which means the distance of the virtual source from the listener 706 is equal to the distance of the listener 706 from the virtual source at a time 2. This means that no distance change of the virtual source 700 takes place for the listener 706. On the other hand, there is a distance change of the virtual source 700 relative to the loudspeakers 704, since r1 is unequal to r2. The virtual source represents the primary transmitter, while the loudspeakers 704 represent the primary receiver. Simultaneously, the loudspeakers 704 represent the secondary transmitter, while the listener 706 represented the secondary receiver.
In wave-field synthesis, the transmission between primary transmitter and primary receiver takes place “virtually”. This means that the wave-field synthesis algorithms are responsible for extension and compression of the wave front of the waveforms. At the time when a loudspeaker 704 receives a signal from the wave-field synthesis module, there is no audible signal at first. The signal only becomes audible after being output by the loudspeaker. Thereby, Doppler effects can occur at different locations.
If the virtual source moves relative to the loudspeakers, every loudspeaker reproduces a signal with different Doppler effect, depending on its specific position with regard to the moving virtual source, since the loudspeakers are in different positions and thus the relative movements are different for every loudspeaker.
On the other hand, the listener can also move relative to the loudspeakers. However, particularly in a cinema setting, this is an insignificant case in practice, since the movement of the listener with regard to the loudspeakers will always be a relatively slow movement with a relatively small Doppler effect, since the Doppler shift, as it is known in the art, is proportional to the relative motion between transmitter and receiver.
The former Doppler effect, which means when the virtual source moves relative to the loudspeakers, can sound relatively natural but also very unnatural. This depends on the direction of the movement. If the source moves away from the center of the system or towards the same in a straight manner, a rather natural effect results. With reference to FIG. 7, this would mean that the virtual source 700 moves, for example, along the arrow R1 away from the listener.
However, if the virtual source 700 “encircles” the listener, as it is illustrated with regard to FIG. 7, a very unnatural effect results, since the relative motion between primary source and primary receiver (loudspeaker) are very strong and also very different within the different primary receivers, which is in sharp contrast to nature, wherein the case of encircling the source to listener no Doppler effects results, since no distance change occurs between source and listener.