The present invention relates to audio multichannel technology and in particular to the synchronization of multichannel extension data with an audio signal for allowing multichannel reconstruction.
Currently developed technologies allow an ever more efficient transmission of audio signals by data reduction, but also an increase of audio enjoyment by extensions, such as by the usage of multichannel technology.
Examples for such an extension of common transmission techniques have become known under the name of “Binaural Cue Coding” (BCC) as well as “Spatial Audio Coding”. Regarding this, reference is made exemplarily to J. Herre, C. Faller, S. Disch, C. Ertel, J. Hilpet, A. Hoelzer, K. Linzmeier, C. Spenger, P. Kroon: “Spatial Audio Coding: Next-Generation Efficient and Compatibel Coding Oberfläche Multi-Channel Audio”, 117th AES Convention, San Francisco 2004, Preprint 6186.
In a sequentially operating transmission system, such as radio or internet, such methods separate the audio program to be transmitted into audio base data or an audio signal, which can be a mono or also a stereo downmix audio signal, and into extension data that can also be referred to as multichannel additional information or multichannel extension data. The multichannel extension data can be broadcast together with the audio signal, i.e. in a combined manner, or the multichannel extension data can also be broadcast separately from the audio signal. As an alternative to broadcasting a radio program, the multichannel extension data can also be transmitted separately, for example to a version of the downmix channel already existing on the user side. In this case, transmission of the audio signal, for example in the form of an internet download or a purchase of a compact disc or DVD takes place spatially and temporally separate from the transmission of the multichannel extension data, which can be provided, for example, from a multichannel extension data server.
Basically, the separation of a multichannel audio signal into an audio signal and multichannel extension data has the following advantages. A “classic” receiver is able to receive and replay audio base data, i.e. the audio signal at any time, independent of content and version of the multichannel additional data. This characteristic is referred to as reverse compatibility. In addition to that, a receiver of the newer generation can evaluate the transmitted multichannel additional data and combine the same with the audio base data, i.e. the audio signal, in such a manner that the complete extension, i.e. the multichannel sound, can be provided to the user.
In an exemplary application scenario in digital radio, with the help of these multichannel extension data, the previously broadcast stereo audio signal can be extended to the multichannel format 5.1 with little additional transmission effort. The multichannel format 5.1 comprises five replay channels, i.e. a left channel L, a right channel R, a central channel C, a left rear channel LS (left surround) and a right rear channel RS (right surround). For this, the program provider generates the multichannel additional information on the transmitter side from multichannel sound sources, such as they are found, for example, on a DVD/audio/video. Subsequently, this multichannel additional information can be transmitted in parallel to the audio stereo signal broadcast as before, which now includes a stereo downmix of the multichannel signal.
One advantage of this method is the compatibility with the so far existing digital radio transmission system. A classical receiver that cannot evaluate this additional information will be able to receive and replay the two-channel sound signal as before without any limitations regarding quality.
A receiver of novel design, however, can evaluate and decode the multichannel information and reconstruct the original 5.1 multichannel signal from the same, in addition to the stereo sound signal received so far.
For allowing simultaneous transmission of the multichannel additional information as a supplement to the stereo sound signal used so far, two solutions are possible for compatible broadcast via a digital radio system.
The first solution is to combine the multichannel additional information with the coded downmix audio signal such that they can be added to the data stream generated by an audio encoder as a suitable and compatible extension. In this case, the receiver only sees one (valid) audio data stream and can again, synchronously to the associated audio data block, extract and decode the multichannel additional information by means of a correspondingly preceding data distributor and output the same as a 5.1 multichannel sound.
This solution necessitates the extension of the existing infrastructure/data paths, such that they can now transport the data signals consisting of downmix signals and extension instead of merely the stereo audio signals as before. This is, for example, possible without additional effort, or unproblematic, when it is a data-reduced illustration, i.e. a bit stream transmitting the downmix signals. A field for the extension information can then be inserted into this bit stream.
A second possible solution is to couple the multichannel additional information not to the used audio coding system. In this case, the multichannel extension data are not coupled into the actual audio data stream. Instead, transmission is performed via a specific but not necessarily temporarily synchronized additional channel, which can, for example, be a parallel digital additional channel. Such a situation occurs, for example, when the downmix data, i.e. the audio signal, are routed through a common audio distribution infrastructure existing in studios in unreduced form, e.g. as PCM data per AES/EBU data format. These infrastructures are aimed at distributing audio signals digitally between various sources (“crossbars”) and/or processing them, for example by means of sound regulation, dynamic compression, etc.
In the second possible solution described above, the problem of time offset of the downmix audio signal and multichannel additional information in the receiver can occur, since both signals pass through different, non-synchronized data paths. A time offset between downmix signal and additional information, however, causes deterioration of the sound quality of the reconstructed multichannel signal, since then an audio signal with multichannel extension data, which actually do not belong to the current audio signal but to an earlier or later portion or block of the audio signal, is processed on the replay side.
Since the order of magnitude of the time offset can no longer be determined from the received audio signal and the additional information, a time-correct reconstruction and association of the multichannel signal in the receiver is not ensured, which will result in quality losses.
A further example for this situation is when an already running 2-channel transmission system is to be extended to multichannel transmission, for example when considering a receiver for digital radio. Here, it is often the case that decoding of the downmix signal frequently takes place by means of an audio decoder already existing in the receiver, which means, for example, a stereo audio decoder according to the MPEG 4 standard. The delay time of this audio decoder is not known or cannot be predicted exactly, due to the system-immanent data compression of audio signals. Hence, the delay time of such an audio decoder cannot be compensated reliably.
In the extreme case, the audio signal can also reach the multichannel audio decoder via a transmission chain including analog parts. Here, digital/analog conversion takes place at a certain point in the transmission, which is followed again by analog/digital conversion after a further storage/transmission. Here also, no indications are available as to how a suitable delay compensation of the downmix signal in relation to the multichannel additional data can be performed. When the sampling frequency for the analog/digital conversion and the digital/analog conversion differ slightly, even a slow time drift of the compensation delay results according to the ratio of the two sampling rates to each other.
German patent DE 10 2004 046 746 B4 discloses a method and an apparatus for synchronizing additional data and base data. A user provides a fingerprint based on his stereo data. An extension data server identifies the stereo signal based on the obtained fingerprint and accesses a database for retrieving the extension data for this stereo signal. In particular, the server identifies an ideal stereo signal corresponding to the stereo signal existing at the user and generates two test fingerprints of the ideal audio signal belonging to the extension data. These two test fingerprints are then provided to the client who determines a compression/expansion factor and a reference offset therefrom, wherein, based on the reference offset, the additional channels are expanded/compressed and cut off at the beginning and the end. Thereupon, a multichannel file can be generated by using the base data and the extension data.