1. Field of the Invention
The present invention relates to stereo audio CD technology, and in particular to apparatus and methods for writing onto audio CDs and respective methods and apparatus for retrieving data from CDs.
2. Description of Prior Art
Recently, multichannel audio reproduction technology has become increasingly important. This may be due to the fact that audio compression/coding technologies such as the prior art MP3 technology have allowed transmitting audio data via the internet or other transmission channels with a limited bandwidth. The MP3 coding technology has become so famous because of the fact that it enables all recordings to be distributed in a stereo format, i.e. in a digital representation of the audio recording, which includes a first, or left, stereo channel and a second, or right, stereo channel.
Alternative media for distributing stereo data are the prior-art audio CDs. The digital compact discs developed from the cooperation of Philips and Sony is based on contactless optical scanning, by means of laser, of a disc, which is recorded on one side, as an information carrier. In the CD player, for reading out, the beam of a semiconductor laser which is reflected by the disc and modulated in its intensity is received by a photodiode. The output signal of the photodiode is converted into a serial data signal, and the clock signal is obtained therefrom. What follows is the separation of the synchronization characters and the re-conversion of the channel code into data, test characters and control/display bits. The control/display decoder provides the signals for motor speed, focusing and track-following as well as for finding and displaying certain places in music. In the error-protection decoder, any disturbed signal information interfered with might be supplemented by means of the check bits. After separating the data stream by means of the multiplexer, the digital/analog reconversion into the analog audio signal of the left and right channels is performed.
In accordance with the standardized CD frame structure which is shared by all audio CDs with stereo information, and to which common CD players are set, there are six successive samples of the left and right channels in one frame, respectively. Transmission starts with the left channel in each case. Each 16-bits sample starts with the MSB and is divided into two audio symbols of 8 bits each. The stereo data is subjected to error protection coding with a two-step, so-called CIRC method.
Nevertheless, fundamental disadvantages of conventional 2-channel sound systems exist. Therefore, so-called surround technology has been developed. A recommended multichannel surround representation includes an additional center channel C and two surround channels Ls, Rs in addition to the two stereo channels L and R. This reference tone format is also referred to as 3/2 stereo, which means that there are three front channels and two surround channels. Generally, five transmission channels are required. In a reproduction environment, at least five loudspeakers are required at the respective five different places in order to obtain an optimum so-called “sweet spot” at a specific distance from five accurately placed loudspeakers.
In the area of CD technology, so-called DVDs have found widespread acceptance. They typically contain a complete 5.1 or 7.1 recording, i.e. a complete representation of each individual sound channel.
What is disadvantageous about DVDs, however, is the fact that specific DVD players are required for them, and that conventional audio CD players thus cannot be used to play back DVDs. In addition, there is also no possibility of upgrading such normal audio CD players with simple measures, so that they would be able to not only play back audio CDs but also DVDs.
This is unfortunate especially because there are a large number of CD players in circulation with which a multichannel reproduction cannot be achieved. On the other hand, however, many customers shy back from “sorting out” the fully functional CD player with which they are familiar and fully contented to now change to DVDs only, even though the customers might not be interested at all in the video information typically contained in the DVDs, but might simply want to have a good 5-channel sound.
It is true that coded multichannel representations obtained via the internet or from other sources might be burned onto CDs, provided that no licensing rights are violated. But such burned CDs, too, are not compatible with normal CD players since they contain coded information, whereas the stereo data contained on the audio CDs is merely uncompressed 16-bits PCM data which is merely subjected to error protection coding, which leads to an increase of the data rate, rather than being subjected to data compression, which would lead to a reduction in the data rate.
Thus, in technology there are many techniques for reducing the amount of data required for transmitting a multichannel audio signal. Such techniques are referred to as joint stereo techniques. To this end, reference shall be made to FIG. 3 which depicts a joint stereo apparatus 60. This apparatus may be an apparatus which implements, for example, intensity stereo (IS) technique or binaural cue coding technique (BCC). Such a device typically receives, as the input signal, at least two channels CH1, CH2, . . . , CHn, and outputs one single carrier channel as well as parametric multichannel information. The parametric data is defined such that an approximation of an original channel (CH1, CH2, . . . , CHn) may be calculated in a decoder.
Normally, the carrier channel will include subband samples, spectral coefficients, time domain samples, etc. which provide a relatively fine representation of the underlying signal, whereas the parametric data includes no such samples or spectral coefficients but includes control parameters for controlling a certain reconstruction algorithm, such as weighting by multiplying, by time-shifting, by frequency-shifting, etc. The parametric multichannel information therefore includes a relatively coarse representation of the signal or of the associated channel. In numbers, the amount of data required by a carrier channel is an amount from about 60 to 7o kbits/s, whereas the amount of data required by parametric side information for a channel ranges between 1.5 and 2.5 kbits/s. It shall be noted that the above numbers apply to compressed data. Naturally, a non-compressed CD channel requires data rates in the range of about 10 times the said amount. An example of parametric data are the prior-art scale factors, intensity stereo information or BCC parameters, as will be set forth below.
The technique of intensity stereo coding is described in the AES preprint 3799, “Intensity Stereo Coding”, J. Herre, K. H. Brandenburg, D. Lederer, February 1994, Amsterdam. Generally, the concept of intensity stereo is based on a main axis transformation to be performed on data of both stereophonic audio channels. When most data points are concentrated around the first main axis, a coding gain may be achieved in that both signals are rotated by a certain angle before the coding takes place. However, this is not always given for real stereophonic reproduction techniques. Therefore, this technique is modified to the effect that the second orthogonal component is excluded from the transmission in the bitstream. Thus, the reconstructed signals for the left and right channels consist of differently weighted or scaled versions of the same signal transmitted. Nevertheless, the reconstructed signals differ with regard to their amplitudes, but they are identical with regard to their phase information. The energy/time envelopes of both original audio channels, however, are maintained by the selective scaling operation which typically operates in a frequency-selective manner. This corresponds to human perception of sound at high frequencies, where the dominant spatial information is determined by the energy envelopes.
In practical implementations, the signal transmitted, i.e. the carrier channel, is additionally generated from the aggregate signal of the left and right channels rather than the rotation of both components. In addition, this processing, i.e. the generation of intensity stereo parameters, is performed, for performing the scaling operations, in a frequency-selective manner, i.e. independently for each scale factor band, i.e. for each coder frequency partition. Preferably, both channels are combined to form a combined, or “carrier”, channel and, in addition to the combined channel, the intensity stereo information. The intensity stereo information depends on the energy of the first channel, the energy of the second channel or the energy of the combined channel.
The BCC technique is described in the AES Convention Paper 5574 “Binaural Cue Coding applied to stereo and multichannel audio compression”, T. Faller, F. Baumgarte, May 2002, Munich. In BCC coding, a number of audio input channels are converted into a spectral representation, specifically using a DFT-based transformation with overlapping windows. The resulting spectrum is partitioned into non-overlapping portions, each of which has an index. Each partition has a bandwidth proportional to the equivalent square bandwidth (ERB). The inter channel level differences (ICLD) and the inter channel time differences (ICTD) are determined for each partition and for each frame k. The ICLD and ICTD are quantized and coded so as to pass, eventually, into a BCC bitstream as side information. The inter channel level differences and the inter channel time differences are given in relation to a reference channel for each channel. Subsequently, the parameters are calculated in accordance with predetermined formulae which depend on the specific partitions of the signal to be processed.
On the decoder side, the decoder typically receives a mono signal and the BCC bitstream. The mono signal is transformed into the frequency domain and is input into a spatial synthesis block which also receives decoded ICLD and ICTD values. In the spatial synthesis block, the BCC parameters (ICLD and ICTD) are used to perform a weighting operation of the mono signal to synthesize those multichannel signals which, after a frequency/time conversion, represent a reconstruction of the original multichannel audio signal.
In the case of BCC, the joint stereo module 60 is operative to output the channel-side information such that the parametric channel data is quantized and coded ICLD or ICTD parameters, one of the original channels being used as a reference channel for coding the channel-side information.
Normally, the carrier signal is formed from the sum of the participating original channels.
Naturally, the above techniques provide only a mono representation for a decoder which can process the carrier channel only but is not able to process the parametric data for generating one or several approximations of more than one input channel.
The BCC technique is also described in the US patent publications US 2003/0219130 A1, US 2003/0026441 A1 and US 2003/0035553 A1. In addition, reference shall be made to the specialist publication “Binaural Cue Coding. Part II: Schemes and Applications”, T. Faller and F. Baumgarte, IEEE Trans. On Audio and Speech Proc. Vol. 11, No. 6, November 2003.
A typical BCC scheme for multichannel audio coding will be represented in more detail below, specifically with reference to FIGS. 4 to 6.
FIG. 5 shows such a BCC scheme for coding/transmitting multichannel audio signals. The multichannel audio input signal at an input 110 of a BCC coder 112 is downmixed in a so-called downmix block 114. In this example, the original multichannel signal at the input 110 is a 5-channel surround signal with a front left channel, a front right channel, a left surround channel, a right surround channel and a center channel. In the preferred embodiment of the present invention, downmix block 114 generates an aggregate signal by simply adding up these five channels into a mono signal.
Other downmixing schemes are known in the art, so that using a multichannel input signal results in a downmix channel having a single channel. This single channel is output at an aggregate-signal line 115. A piece of side information obtained from the BCC analysis block 116 is output on a side information line 117.
In the BCC analysis block inter channel level differences (ICLD) and inter channel time difference (ICTD) are calculated as has been represented above. As of late, the BCC analysis block 116 is also able to calculate inter channel correlation values (ICC values). The aggregate signal and the side information are transmitted to a BCC decoder 120 in a quantized and coded format. The BCC decoder decomposes the transmitted aggregate signal into a number of subbands and performs scalings, delays and other processing steps to supply the subbands of the multichannel audio channels to be output. This processing is performed such that the ICLD, ICTD and ICC parameters (cues) of a reconstructed multichannel signal at the output 121 match the respective cues for the original multichannel signal at the input 110 in the BCC coder 112. For this purpose, BCC decoder 120 includes a BCC synthesis block 122 and a side information revision block 123.
The internal structure of the BCC synthesis block 122 will be represented below with reference to FIG. 6. The aggregate signal on line 115 is fed to a time/frequency conversion unit or filter bank FB 125. At the output of block 125, there are a number N of subband signals or, in an extreme case, a block of spectral coefficients when the audio filter bank 125 performs a 1:1 transformation, i.e. a transformation generating N spectral coefficients from N time domain samples.
The BCC synthesis block 122 further includes a delay stage 126, a level modification stage 127, a correlation processing stage 128 and an inverse filter bank stage IFB 129. At the output of stage 129, the reconstructed multichannel audio signal having, for example, five channels in the event of a 5-channel surround system, may be output to a set of speakers 124 as are represented in FIG. 5 or FIG. 4.
The input signal sn is converted to the frequency range or the filter bank range by means of the element 125. The signal output by element 125 is copied such that several versions of the same signal will be obtained, as is represented by the copying node 130. The number of versions of the original signal equals the number of output channels in the output signal. Then each version of the original signal is subjected, at node 130, to a certain delay d1, d2, . . . , di, . . . dN. The delay parameters are calculated by the side information processing block 123 in FIG. 5 and are derived from the inter channel time differences as have been calculated by the BCC analysis block 116 of FIG. 5.
The same applies to multiplication parameters a1, a2, . . . , ai, . . . , aN which are also calculated by the side information processing block 123 on the basis of the inter channel level differences as are calculated by BCC analysis block 116.
The ICC parameters calculated by the BCC analysis block 116 are used for controlling the functionality of block 128, so that certain correlations between the signals which are delayed and manipulated in their levels are obtained at the outputs of block 128. It shall be noted here that the order of stages 126, 127, 128 may deviate from the order shown in FIG. 6.
It shall be pointed out that with frame-wise processing of the audio signal, the BCC analysis is also performed in a frame-wise, i.e. temporally variable, manner and that, in addition, a frequency-wise BCC analysis is obtained, as may be seen from the filter band partitioning from FIG. 6. This means that the BCC parameters are obtained for each spectral band. This means further that in the event that the audio filter bank 125 decomposes the input signal into, for example, 32 bandpass signals, the BCC analysis block will obtain a set of BCC parameters for each of the 32 bands. Of course, BCC synthesis block 122 of FIG. 5, depicted in detail in FIG. 6, performs a reconstruction which is also based on the 32 bands mentioned by way of example.
With reference to FIG. 4, a scenario will be presented below which is used to determine individual BCC parameters. Normally, the ICLD, ICTD and ICC parameters may be defined between pairs of channels. However, it is preferred to determine the ICLD and ICTC parameters between a reference channel and any other channel. This is depicted in FIG. 4A.
ICC parameters may be defined in various manners. Generally speaking, ICC parameters in the coder may be determined between all possible pairs of channels, as is shown in FIG. 4B. However, it has been proposed to calculate only ICC parameters between the two most powerful channels at any time, as is shown in FIG. 4C, which depicts an example wherein an ICC parameter between channels 1 and 2 is calculated at one time, and an ICC parameter between channels 1 and 5 is calculated at another time. Subsequently, the decoder synthesizes the inter channel correlation between the most powerful channels in the decoder and uses certain heuristic rules for calculating and synthesizing the inter channel coherence for the remaining pairs of channels.
With regard to calculating, for example, the multiplication parameters a1, aN on the basis of the ICLD parameters transmitted, reference shall be made to the AES convention paper No. 5574. The ICLD parameters represent an energy distribution of an original multichannel signal. Without loss of generality, it is preferred, as depicted in FIG. 4A, to take four ICLD parameters which represent the energy difference between the respective channels and the front left channel. In the side information processing block 122, the multiplication parameters a1, . . . , aN are derived from the ICLD parameters in such a manner that the entire energy of all reconstructed output channels is the same (or is proportional to the energy of the aggregate signal transmitted).
In order to put multichannel information onto CDs, one may also fall back, beside the DVDs provided, to special audio CDs which store the sound channels in a data-reduced form using audio coding methods such as DTS. These special audio CDs cannot be played back on normal audio CD players, but require a decoder of their own which in most cases is to be connected externally to the digital output of the normal audio CD player.
In addition, there are hybrid SACDs which offer, by means of two layers on the CD, both the conventional stereo sound for reproduction on audio CD players (in one of the layers) and the multichannel sound in the DSC format (in the other layer) for reproduction on SACD players.