1. Field of the Invention
The present invention relates to generating one or more low-frequency channels, and in particular to generating one or more low-frequency channels in connection with a multichannel audio system, such as a wave-field synthesis system.
2. Description of Prior Art
There is an increasing need for new technologies and innovative products in the area of entertainment electronics. It is an important prerequisite for the success of new multimedia systems to offer optimal functionalities or capabilities. This is achieved by the employment of digital technologies and, in particular, computer technology. Examples for this are the applications offering an enhanced close-to-reality audiovisual impression. In previous audio systems, a substantial disadvantage lies in the quality of the spatial sound reproduction of natural, but also of virtual environments.
Methods of multi-channel speaker reproduction of audio signals have been known and standardized for many years. All usual techniques have the disadvantage that both the site of the speakers and the position of the listener are already impressed on the transfer format. With wrong arrangement of the speakers with reference to the listener, the audio quality suffers significantly. Optimal sound is only possible in a small area of the reproduction space, the so-called sweet spot.
A better natural spatial impression as well as greater enclosure or envelope in the audio reproduction may be achieved with the aid of a new technology. The principles of this technology, the so-called wave-field synthesis (WFS), have been studied at the TU Delft and first presented in the late 80s (Berkout, A. J.; de Vries, D.; Vogel, P.: Acoustic control by Wave-field Synthesis. JASA 93, 1993).
Due to this method's enormous requirements for computer power and transfer rates, the wave-field synthesis has up to now only rarely been employed in practice. Only the progress in the area of the microprocessor technology and the audio encoding do permit the employment of this technology in concrete applications today. First products in the professional area are expected next year. In a few years, first wave-field synthesis applications for the consumer area are also supposed to come on the market.
The basic idea of WFS is based on the application of Huygens' principle of the wave theory:
Each point caught by a wave is starting point of an elementary wave propagating in spherical or circular manner.
Applied to acoustics, every arbitrary shape of an incoming wave front may be replicated by a large amount of speakers arranged next to each other (a so called speaker array). In the simplest case, a single point source to be reproduced and a linear arrangement of the speakers, the audio signals of each speaker have to be fed with a time delay and amplitude scaling so that the radiating sound fields of the individual speakers overlay correctly. With several sound sources, for each source the contribution to each speaker is calculated separately and the resulting signals are added. In a room with reflecting walls, reflections may also be reproduced via the speaker array as additional sources. Thus, the expenditure in the calculation strongly depends on the number of sound sources, the reflection properties of the recording room, and the number of speakers.
In particular, the advantage of this technique is that a natural spatial sound impression across a great area of the reproduction space is possible. In contrast to the known techniques, direction and distance of sound sources are reproduced in a very exact manner. To a limited degree, virtual sound sources may even be positioned between the real speaker array and the listener.
Although the wave-field synthesis functions well for environments whose properties are known, irregularities occur if the property changes or the wave-field synthesis is executed on the basis of an environment property not matching the actual property of the environment.
The technique of the wave-field synthesis, however, may also be advantageously employed to supplement a visual perception by a corresponding spatial audio perception. Previously, in the production in virtual studios, the conveyance of an authentic visual impression of the virtual scene was in the foreground. The acoustic impression matching the image is usually impressed on the audio signal by manual steps in the so-called postproduction afterwards or classified as too expensive and time-intensive in the realization and thus neglected. Thereby, usually a contradiction of the individual sensations arises, which leads to the designed space, i.e. the designed scene, to be perceived as less authentic.
In most cases, a concept is applied which is about obtaining an overall acoustic impression of the visually depicted scene. This can be described very well using the term of “total”, which originates from the field of image design. This “total” sound impression mostly remains constant across all settings in a scene, even though the optical angle of view of objects undergoes big changes in most cases. For example, optical details are emphasized or de-emphasized by means of appropriate settings. Counter-shots in creating dialog in film are also not reproduced by sound.
Therefore, there is the need to acoustically embed the viewer into an audio-visual scene. Here, the screen or image area forms the viewer's line of vision and angle of view. This means that the sound is to follow the image in the sense that it always matches the image seen. This is becoming even more important particularly for virtual studios, since there is typically no correlation between the sound of, for example, presentation and the environment in which the presenter is currently located. To get an overall audio-visual impression of the scene, a spatial impression which matches the image rendered must be simulated. An essential subjective property in such a sound concept is, in this connection, the location of a sound source, such as is perceived by a viewer of, e.g., a cinema screen.
In the audio range, good spatial sound may be achieved for a large audience area by means of the technique of wave-field synthesis (WFS). As has been illustrated, wave-field synthesis is based on the Huygens principle, according to which wave fronts may be formed and built up by superposition of elementary waves. In accordance with a mathematically exact theoretical description, an infinite number of sources would have to be utilized at infinitely small distances for generating the elementary wave. In practice, however, a finite number of loudspeakers are utilized at finitely small distances from one another. Each of these loudspeakers is driven in accordance with the WFS principle, by an audio signal of a virtual source which has a certain delay and a certain level. Typically, levels and delays are different for all loudspeakers.
As has already been illustrated, the wave-field synthesis system operates on the basis of the Huygens principle and reconstructs a given waveform of, e.g., a virtual source, arranged at a certain distance from a presentation area and/or a listener in the presentation area, by means of a plurality of individual waves. Thus, the wave-field synthesis algorithm obtains information about the actual position of an individual loudspeaker from the loudspeaker array so as to then calculate, for this individual loudspeaker, a component signal which this loudspeaker ultimately must radiate off so that at the listener's end, a superposition of the loudspeaker signal from the one loudspeaker with the loudspeaker signals of the other active loudspeakers performs a reconstruction to the effect that the listener is under the impression of not being exposed to sound from many individual loudspeakers, but merely from one single loudspeaker at the position of the virtual source.
For several virtual sources in a wave-field synthesis setting, the contribution of each virtual source for each loudspeaker, i.e. the component signal of the first virtual source for the first loudspeaker, of the second virtual source for the second loudspeaker, etc., is calculated so as then to add up the component signals to eventually obtain the actual loudspeaker signal. In the event of, for example, three virtual sources, the superposition of the loudspeaker signals of all active loudspeakers at the listener would result in the listener not being under the impression that he/she is exposed to sound from a large array of loudspeakers, but that the sound that he/she hears stems merely from three sound sources which are positioned at specific positions and which are identical with the virtual sources.
In practice, the component signals are calculated mostly in that the audio signal associated with one virtual source has a delay and a scaling factor applied to it at a certain point in time, depending on the position of the virtual source and the position of the loudspeaker, to obtain a delayed and/or scaled audio signal of the virtual source which immediately represents the loudspeaker signal if there is only one virtual source, or which, after an addition with further component signals for the considered loudspeaker of other virtual sources, will then contribute to the loudspeaker signal for the loudspeaker contemplated.
Typical wave-field synthesis algorithms operate irrespective of how many loudspeakers are present in the loudspeaker array. The theory underlying wave-field synthesis is that any desired sound field may be exactly reconstructed by an infinitely high number of individual loudspeakers, the individual loudspeakers being arranged at infinitely small distances from one another. In practice, however, neither the infinitely high number nor the arrangement at infinitely small distances may be realized. Instead, there are a limited number of loudspeakers which, furthermore, are arranged at certain, predefined distances from one another. Thus, with real systems, what is achieved is only ever an approximation to the actual waveform which would occur if the virtual source were actually present, i.e. were a real source.
In addition, there are various scenarios to the effect that the loudspeaker array is arranged, if a cinema is contemplated, only e.g. at the side of the cinema screen. In this case, the wave-field synthesis module would generate loudspeaker signals for these loudspeakers, the loudspeaker signals for these loudspeakers normally being the same as those for corresponding loudspeakers in a loudspeaker array which extends, e.g., not only across that side of a cinema at which the screen is located, but which is also arranged to the left, to the right and behind the audience space. This “360°” loudspeaker array naturally will provide a better approximation to an exact wave field than merely a one-sided array, for example in front of the audience. However, the loudspeaker signals for the loudspeakers which are arranged in front of the audience are the same in both cases. This means that a wave-field synthesis module typically does not obtain any feedback as to how many loudspeakers are present and/or as to whether or not the array is a one-sided or a multi-sided or even a 360° array. In other words, a wave-field synthesis means calculates a loudspeaker signal for a loudspeaker on the basis of the position of the loudspeaker, irrespective of whether or not there are any further loudspeakers. It is true that this is a considerable advantage of the wave-field synthesis algorithm in the sense that it is modularly adjustable to various circumstances in an optimum manner, in that the coordinates of the existing loudspeakers are simply present in totally different presentation rooms. What is disadvantageous, however, is the fact that in addition to the poorer reconstruction of the current wave-field, which may be acceptable in certain circumstances, considerable level artefacts arise. For a real impression, what is crucial is not only the direction in which the virtual source is situated in relation to the listener, but also the loudness with which the listener hears the virtual source, i.e. which level “arrives” at the listener due to a specific virtual source. The level arriving at a listener which is related to a virtual source contemplated results from the superposition of the individual signals of the loudspeakers.
If one contemplates, for example, the case where a loudspeaker array of 50 loudspeakers is arranged in front of the listener, and where the audio signal of the virtual source is imaged, by the wave-field synthesis means, into component signals for the 50 loudspeakers, such that the audio signal is radiated off simultaneously by the 50 loudspeakers with various delays and various scalings, a listener to the virtual source will perceive a level of the source which results from the individual levels of the component signals of the virtual source in the individual loudspeaker signals.
If the same wave-field synthesis means is now used for a reduced array in which there are, for example, only 10 loudspeakers in front of the listener, it is readily obvious that the level of the signal from the virtual source which results at the listener's ear has decreased, since 40 component signals of the loudspeakers which are now missing are “missing”, as it were.
The alternative case may also occur, in which there are loudspeakers, e.g. initially to the left and right of the listener, which are driven in an anti-phase manner in a specific constellation so that the loudspeaker signals from two opposite loudspeakers cancel each other out due to a certain delay calculated by the wave-field synthesis means. If now, in a reduced system, the loudspeakers to the one side of the listener, for example, are done away with, the virtual source suddenly appears to be substantially louder than it actually should be.
Whereas for statistical sources for level correction one might also think of constant factors, said solution will no longer be viable if the virtual sources are not static but are moving. An essential feature of wave-field synthesis is the very fact that it can also, and particularly, process moving virtual sources. A correction with a constant factor would not suffice here, since the constant factor would indeed be true for one position, but for another position of the virtual source it would act in such a manner that it would increase the artefact.
In addition, wave-field synthesis means are able to imitate several different types of sources. A prominent form of source is the point source, wherein the level decreases proportionally by 1/r, wherein r is the distance between a listener and the position of the virtual source. A different kind of source is a source which sends out plane waves. Here, the level remains constant irrespective of the distance from the listener, since plane waves may be generated by point sources arranged at infinite distances.
In accordance with the wave-field synthesis theory, with two-dimensional loudspeaker arrangements, the change of level matches the natural change of level as a function of r, except for a negligible error. However, depending on the position of the source, different errors—some of which are substantial—in the absolute level may result which result from the utilization of a finite number of loudspeakers instead of the infinite number of loudspeaker theoretically required, as has been set forth above.
A further difficulty existing with multichannel playback systems and, in particular, with wave-field synthesis systems using not only, e.g., five or seven loudspeakers, but a substantially higher number of loudspeakers, is that the loudspeakers may lead to considerable costs due to their high number. To reduce the cost of the loudspeakers, the so-called subwoofer principle is employed with such existing five-channel systems or seven-channel systems. With multichannel playback systems, the subwoofer principle serves to save expensive and large-size low-frequency loudspeakers. Here, use is made of a low-frequency channel which contains only music signals having frequencies lower than a base frequency of about 120 Hz. Said low-frequency channel drives a low-frequency loudspeaker having a large diaphragm area, which achieves high sound pressures especially at low frequencies.
The subwoofer principle makes use of the fact that human hearing has great difficulty in locating low-frequency sounds in terms of their directions. In current systems, an additional low-frequency channel for a specific loudspeaker arrangement (spatial arrangement) is mixed as early as in sound mixing. Examples of such multichannel playback systems are Dolby Digital, Sony SDDS and DTS. With these multichannel formats, the subwoofer channel may be mixed irrespective of the size of the room to be exposed to sound, since the spatial conditions change only in terms of scale. In terms of scale, the loudspeaker arrangement remains the same.
Using wave-field synthesis, a large audience area may be exposed to sound. Sound events may be reproduced at their spatial depth. To this end, the entire sound field of the individual sound events is reproduced in the audience area. This is achieved by means of a large number of loudspeakers. For large installations, about 500 or more loudspeaker systems are required. If one wanted to equip each individual loudspeaker system with a high-performance low-frequency loudspeaker, very high cost would be the result.
It has been mentioned that for existing multichannel formats, a specific loudspeaker arrangement is required in order to mix a specific subwoofer channel. However, the loudspeaker arrangement may be changed in terms of scale without having to alter the respective mix. The ratio of the distances of the individual loudspeakers from one another remains the same. However, all this is not possible with WFS, since the number of loudspeaker channels depends on the size of the area of the WFS playback system which is to be exposed to sound. This is why the individual loudspeaker channels cannot be stored, which would also be quite expensive in terms of memory if one contemplates systems with 500 or more audio channels. Therefore, only the virtual sound events to be simulated are stored. It is only at playback that the individual loudspeaker channels are calculated using the WFS algorithm.
On the one hand, the number of loudspeaker channels thus is associated with the size of the audience area. In addition, the number of loudspeaker channels is determined by the density in which the loudspeakers are distributed across the area to be exposed to sound. The quality of the WFS playback system depends on said density. The loudness is associated with the number of loudspeaker channels and the density of the loudspeakers, since, as one knows, all loudspeaker channels add up to a wave-field. The loudness of a WFS system is thus not readily predetermined. The loudness of the subwoofer channel, however, is predetermined with the known parameters of the electrical amplifier and the loudspeaker. It is therefore not possible to transfer a mix of a subwoofer channel from a WFS system to a WFS system with a different loudspeaker density and a different number of loudspeakers in an error-free manner. The loudnesses from the low-frequency system, on the one hand, and from the mid-/high-frequency system, on the other hand, would not match.