Embodiments of the present invention relate to a device, a method and a computer program for providing an audio signal which is based on at least two source signals which are recorded by microphones which are arranged within a space or an acoustic scene.
More complex recordings and/or acoustic scenes are usually recorded using audio mixing consoles to the extent that the recording relates to audio signals. In this context, any sound composition and/or any sound signal should be understood to be an acoustic scene. To account for the fact that the acoustic signal and/or sound or audio signal received by a listener and/or at a listening position typically comes from a plurality of different sources, the term ‘acoustic scene’ is used herein, wherein an acoustic scene as referred to herein may, of course, also be generated by merely a single source of sound. However, the character of such an acoustic scene is not only determined by the number and/or the distribution of the sources of sound in a space which generate the same, but also by the shape and/or geometry of the space itself. For example, reflections caused by partition walls are superposed on the sound portions directly reaching a listener from the source of sound as part of the room acoustics in enclosed spaces that, in simple terms, may be understood to be a temporally delayed and attenuated copy of the direct sound portions amongst others.
In such environments, an audio mixing console is often used to produce audio material which comprises a plurality of channels and/or inputs each of which is associated with one of many microphones which are again arranged within the acoustic scene, such as within a concert hall or the like. The individual audio and/or source signals may here be present in both analog and digital form, e.g., as a series of digital sample values, wherein the sample values are temporally equidistant and correspond each to an amplitude of the sampled audio signal. Depending on the audio signal used, such a mixing console may thus be implemented as, e.g., a dedicated hardware or as a software component on a PC and/or a programmable CPU provided that the audio signals are available in digital form. Electrical audio signals which may be processed using such audio mixing consoles may—except for microphones—also come from other playback devices, such as instruments and effect equipment or the like. In doing so, each single audio signal and/or each audio signal to be processed may be associated with a separate channel strip on the mixing console, wherein a channel strip may provide multiple functions concerning the tonal change of the associated audio signal, such as a change in volume, a filtering, a mixing with other channel strips, a distribution and/or a splitting of the relevant channel or the like.
When recording complex audio scenes, such as concert recordings, the problem is often to generate the audio signal and/or the mixed recording such that the sound impression as close to the original as possible is created for a listener when listening to the recording. Here, the so-called mixing of the initially recorded microphone signals and/or source signals for different reproduction configurations may need to take place differently, such as for different numbers at output channels and/or loudspeakers. Corresponding examples include a stereo configuration and multichannel configurations such as 4.0, 5.1 or the like. To be able to create such a spatial audio mixing and/or mixing, to date the volume is set for each source of sound and/or for each microphone and/or source signal at the respective channel strip such that the spatial impression desired by the sound engineer results for the listening configuration desired. This is mainly achieved by the volume being distributed between several playback channels and/or loudspeakers by so-called panning algorithms such that a phantom source of sound is created between the loudspeakers to achieve a spatial impression. This means, due to the different volumes for the individual playback channels, the listener is given the impression that, for example, the object reproduced is spatially located between the loudspeakers. To facilitate this, to date each channel has to be adjusted manually based on the real position of the recording microphone within the acoustic scene and has to be aligned with a partly considerable number of further microphones.
Such audio mixings become even more complicated and time-consuming and/or cost-intensive if the listener should be given the impression that the recorded source of sound is moving. In this case, the volume for all channel strips involved has to be readjusted manually for each of the temporally variable, spatial configurations and/or for each time step within the movement of a source of sound, something that is not only extremely time-consuming but also susceptible to errors.
In some scenarios, such as when recording a symphonic orchestra, a large number of microphone signals and/or source signals of, e.g., more than 100 is recorded simultaneously and is possibly processed in real-time to an audio mixing. To achieve such a spatial mixing, to date the operator and/or sound engineer has to generate, at least in the run-up to the actual recording the spatial relationship between the individual microphone signals and/or source signals on a conventional mixing console by initially taking a note of the positions of the microphones and their association with the individual channel strips by hand in order to control the volumes and possibly other parameters, such as a distribution of volumes for multiple channels or reverberation (pan and reverberation) of the individual channel strips such that the audio mixing has the desired spatial effect at the desired listening position and/or for a desired loudspeaker arrangement. In case of a symphonic orchestra with more than 100 instruments each of which is recorded separately as a direct source signal, this may be a problem which is almost impossible to solve. In order to reproduce a spatial arrangement of the recorded source signals of the microphones on the mixing console which is similar to reality following the recording, to date the positions of the microphones have been outlined by hand or their positions have been numbered in order to then be able to reproduce the spatial audio mixing in a time-consuming procedure by setting the volume of all individual channel strips. However, in case of a very large number of microphone signals to be recorded, it is not only the subsequent mixing of a successful recording which presents a big challenge.
Rather, in case of a large number of source signals to be recorded, it is already a problem difficult to solve to ensure that any and all microphone signals are delivered to the mixing console and/or a software used for audio mixing free from interference. To date, this has to be verified by the sound engineer and/or an operator of the mixing console listening and/or checking all channel strips separately, something that is very time-consuming and, if an interfering signal occurs of which the origin cannot immediately be located, may result in a time-consuming error search. When listening to and/or switching individual channels and/or source signals on/off, care must also be taken to ensure that the additional recordings, which associate the microphone signal and the position of the same with the channel of the mixing console during the recording, are error-free. This check alone may take several hours in case of large recordings, whereby it is subsequently difficult or no longer possible to compensate for errors made in the complex check, once the recording has been finalized.