This type of coding/decoding is based on the extraction of spatial information parameters so that, on decoding, these spatial characteristics can be reconstructed for the listener.
This type of parametric coding is applied in particular for a stereo signal. Such a coding/decoding technique is, for example, described in the document Breebaart, J. and van de Par, S and Kohlrausch, A. and Schuijers, entitled “Parametric Coding of Stereo Audio” in EURASIP Journal on Applied Signal Processing 2005:9, 1305-1322. This example is reprised with reference to FIGS. 1 and 2 respectively describing a parametric stereo coder and decoder.
Thus, FIG. 1 describes a coder receiving two audio channels, a left channel (denoted L) and a right channel (denoted R).
The channels L(n) and R(n) are processed by blocks 101, 102 and 103, 104 respectively which perform a short-term Fourier analysis. The transformed signals L[j] and R[j] are thus obtained.
The block 105 performs a channel reduction matrixing, or “Downmix” to obtain from the left and right signals a sum signal, a mono signal in the present case, in the frequency domain.
An extraction of spatial information parameters is also performed in the block 105.
The parameters of ICLD (“InterChannel Level Difference”) type, also called interchannel intensity difference, characterize the energy ratios for each frequency subband between the left and right channels.
They are defined in dB by the following formula:
                              ICLD          ⁡                      [            k            ]                          =                              10            ·                                          log                10                            (                                                                    ∑                                          j                      =                                              B                        ⁡                                                  [                          k                          ]                                                                                                                                    B                        ⁡                                                  [                                                      k                            +                            1                                                    ]                                                                    -                      1                                                        ⁢                                                            L                      ⁡                                              [                        j                        ]                                                              ·                                                                  L                        *                                            ⁡                                              [                        j                        ]                                                                                                                                  ∑                                          j                      =                                              B                        ⁡                                                  [                          k                          ]                                                                                                                                    B                        ⁡                                                  [                                                      k                            +                            1                                                    ]                                                                    -                      1                                                        ⁢                                                            R                      ⁡                                              [                        j                        ]                                                              ·                                                                  R                        *                                            ⁡                                              [                        j                        ]                                                                                                        )                                ⁢          dB                                    (        1        )            
in which L[j] and R[j] correspond to the (complex) spectral coefficients of the channels L and R, the values B[k] and B[k+1], for each frequency band k, define the subdivision into sub-bands of the spectrum and the symbol * indicates the complex conjugate.
A parameter of ICPD (“InterChannel Phase Difference”) type, also called phase difference for each frequency subband, is defined according to the following relationship:ICPD[k]=∠(Σj=B[k]B[k+1]−1L[j]·R*[j])  (2)
in which ∠ indicates the argument (the phase) of the complex operand.
In a manner equivalent to the ICPD, it is also possible to define an interchannel time difference (ICTD).
An interchannel coherence (ICC) parameter represents the interchannel correlation.
These parameters ICLD, ICPD and ICC are extracted from the stereo signals by the block 105.
The monosignal is passed into the time domain (blocks 106 to 108) after short-term Fourier synthesis (inverse FFT, windowing and overlap-add (OLA)) and a mono coding (block 109) is performed. In parallel, the stereo parameters are quantized and coded in the block 110.
In general, the spectrum of the signals (L[j],R[j]) is divided according to a nonlinear frequency scale of ERB (Equivalent Rectangular Bandwidth) or Bark type, with a number of sub-bands ranging typically from 20 to 34. This scale defines the values of B(k) and B(k+1) for each sub-band k. The parameters (ICLD, ICPD, ICC) are coded by scalar quantization possibly followed by an entropic coding or a differential coding. For example, in the paper cited previously, the ICLD is coded by a nonuniform quantizer (ranging from −50 to +50 dB) with differential coding; the non-uniform quantization step exploits the fact that the greater the ICLD value, the lower the auditory sensitivity to the variations of this parameter.
In the decoder 200, the monosignal is decoded (block 201), and a decorrelator is used (block 202) to produce two versions {circumflex over (M)}(n) and {circumflex over (M)}′(n) of the decoded monosignal. These two signals passed into the frequency domain (blocks 203 to 206) and the decoded stereo parameters (block 207) are used by the stereo synthesis (block 208) to reconstruct the left and right channels in the frequency domain. These channels are finally reconstructed in the time domain (blocks 209 to 214).
In stereo signal coding techniques, an intensity stereo coding technique consists in coding the sum channel (M) and the energy ratios ICLD as defined above.
Intensity stereo coding exploits the fact that perception of the high-frequency components is mainly linked to the time (energy) envelopes of the signal.
For monosignals, there are also quantization techniques with or without memory such as the “pulse-code modulation” (PCM) coding or its adaptive version called “adaptive differential pulse-code modulation” (ADPCM).
Interest here is more particularly focused on ITU-T Recommendation G.722 which uses ADPCM (adaptive differential pulse code modulation) coding with code nested in sub-bands.
The input signal of a G.722-type coder is wideband with a minimum bandwidth of [50-7000 Hz] with a sampling frequency of 16 kHz. This signal is broken down into two subbands [0-4000 Hz] and [4000-8000 Hz] obtained by breakdown of the signal by quadrature mirror filters (QMF), then each of the sub-bands is separately coded by an ADPCM coder.
The low band is coded by an ADPCM coding with nested codes on 6, 5 and 4 bits whereas the high band is coded by an ADPCM coder of two bits per sample. The total bit rate is 64, 56 or 48 bit/s depending on the number of bits used for the decoding of the low band.
Recommendation G.722 was first used in the ISDN (integrated services digital network), then in enhanced telephony applications on HD (high definition) voice quality IP networks.
A quantized signal frame according to the G.722 standard is made up of quantization indices coded on 6, 5 or 4 bits in the low band (0-4000 Hz) and 2 bits in the high band (4000-8000 Hz). Since the transmission frequency of the scalar indices is 8 kHz in each sub-band, the bit rate is 64, 56 or 48 Kbit/s. In the G.722 standard, the 8 bits are distributed as follows: 2 bits for the high band, 6 bits for the low band. The last or the last two bits of the low band can be “stolen” or replaced by data.
The ITU-T has recently launched a standardization activity called G.722-SWB (in the context of the Q.10/16 issue described, for example, in the document: ITU-document: Annex Q10.J Terms of Reference (ToR) and time schedule for the super wideband extension to ITU-T G.722 and ITU-T G.711WB, January 2009, WD04_G722G711SWBToRr3.doc) which consists in extending the G.722 Recommendation in two ways:                An extension of the acoustic band from 50-7000 Hz (wideband) to 50-14000 Hz (super-wide band, SWB).        An extension from mono to stereo. This stereo extension can extend a mono coding in wideband or a mono coding in super-wideband.        
In the context of G.722-SWB, the G.722 coding works with short 5 ms frames.
The focus of interest here is more particularly on the stereo extension of the wideband G.722 coding.
Two G.722 stereo extension modes are to be tested in the G.722-SWB standardization:                A 56 Kbit/s G.722 stereo extension with an additional bit rate of 8 Kbit/s, or 64 Kbit/s in total        a 64 Kbit/s G.722 extension with an additional bit rate of 16 Kbit/s, or 80 Kbit/s in total.        
The spatial information represented by the ICLD or other parameters requires an (additional stereo extension) bit rate that is all the greater when the coding frames are short.
As an example, in the context of the G.722-SWB standardization, if it is assumed that a G.722 (wideband) stereo extension is implemented by the intensity coding technique, the following stereo extension bit rate is obtained.
For a sum (mono) signal coded by G.722 with a 5 ms frame and a breakdown of the wideband spectrum (0-8000 Hz) into 20 sub-bands, 20 ICLD parameters to be transmitted every 5 ms are obtained. It can be assumed that these ICLD parameters are coded with an (average) bit rate of the order of 4 bits per sub-band. The G.722 stereo extension bit rate therefore becomes 20×4 bits/5 ms=16 Kbit/s. Thus, the G.722 stereo extension by ICLD with 20 sub-bands results in an additional bit rate of the order of 16 Kbit/s. Now, according to the prior art, ICLD coding on its own is not generally sufficient to achieve a good stereo quality.
This example therefore illustrates the difficulty in producing a stereo extension of a coder such as G.722 with short (5 ms) frames.
A direct coding of the ICLD (with no other parameters) gives an additional (stereo extension) bit rate of around 16 Kbit/s which is already the maximum possible extension bit rate for the G.722 extension.
There is therefore a need to represent the stereo, or more generally multichannel signal, effectively, with a bit rate that is as low as possible, with an acceptable quality, when the coding frames are short.