There are various techniques for digitizing and compressing audio-frequency speech, music, etc. signals. The commonest methods are:                “waveform coding” methods such as PCM and ADPCM coding;        “parametric analysis/synthesis coding” methods, such as code excited linear prediction (CELP) coding;        “sub-band or transform perceptual coding” methods.        
These conventional techniques for coding audio-frequency signals are described in W. B. Kleijn and K. K. Paliwal, Editors, “Speech Coding and Synthesis”, Elsevier, 1995.
In this context, the invention more specifically addresses predictive transform coding methods incorporating the CELP coding and transform coding techniques.
In conventional speech coding, the coder generates a bit stream at a fixed bit rate. This fixed bit rate constraint simplifies implementation and use of the coder and of the decoder, commonly referred to in combination as a “codec”. Examples of such systems are: the ITU-T G.711 coding system at 64 kilo bits per second (kbps), the UIT-T G.729 coding system at 8 kbps and the GSM-EFR coding system at 12.2 kbps.
However, in some applications, such as mobile telephony, voice over IP, and communication over ad hoc networks, it is preferable to generate a bit stream at a variable bit rate, with bit rates taken from a predefined set. A number of multiple bit rate coding techniques that are more flexible than fixed bit rate coding can therefore be distinguished:                source and/or channel controlled multimode coding, as used in the AMR-NB, AMR-WB, SMV, and VMR-WB systems;        hierarchical coding, also known as “scalable” coding, which generates a bit stream that is hierarchical in the sense that it includes a core bit rate and one or more enhancement layers. The G.722 system at 48 kbps, 56 kbps, and 64 kbps is a simple example of bit rate scalable coding. The MPEG-4 CELP codec is scalable in bit rate and in bandwidth; other examples of such coders can be found in the paper by B. Kovesi, D. Massaloux, A. Sollaud, “A Scalable Speech and Audio Coding Scheme with Continuous Bitrate Flexibility”, ICASSP 2004;        multiple description coding.        
The present invention relates more particularly to hierarchical coding.
The basic concept of hierarchical, or “scalable”, audio coding is illustrated in the paper by Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, and A. Kataoka, “Scalable Speech Coding Technology for High-Quality Ubiquitous Communications”, NTT Technical Review, March 2004, for example.
In this type of coding, the bit stream includes a base layer or core layer and one or more enhancement layers. The base layer is generated by a codec known as the core “codec” at a low fixed bit rate that guarantees some minimum level of coding quality and that must be received by the decoder in order to maintain an acceptable level of quality.
The enhancement layers are used to enhance quality; they may not all be received by the decoder. The main benefit of hierarchical coding is that the bit rate can be adapted simply by truncating the bit stream. The possible number of layers, i.e. the possible number of truncations of the bit stream, defines the coding granularity: in strong granularity coding the bit stream includes few layers (of the order of 2 to 4 layers), whereas fine granularity coding provides an increment of the order of 1 kbps, for example.
The invention relates more particularly to bit rate and bandwidth scalable coding techniques using a CELP type core coder in the telephone band and one or more wide band enhancement layers. Examples of such systems are given in the paper by H. Taddéi et al., “A Scalable Three Bitrate (8, 14.2, and 24 kbps) Audio Coder”, 107th Convention AES, 1999, with coarse granularity of 8 kbps, 14.2 kbps, and 24 kbps, and the aforementioned paper by B. Kovesi et al refers to a fine granularity of 6.4 kbps to 32 kbps.
In 2004 the ITU-T launched a standardized hierarchical core coder project. This G.729EV coder (EV standing for “embedded variable bitrate”) is an add-on the known G.729 coder. The objective of the G.729EV standard is to obtain a G.729 core hierarchical coder producing a signal with a band that extends from the narrow band (300 hertz (Hz) to 3400 Hz) to the wide band (50 Hz to 7000 Hz) at a bit rate of 8 kbps to 32 kbps for conversation services. This coder is inherently capable of interworking with the G.729 recommendation, which ensures compatibility with existing voice over IP equipment.
The 8 kbps to 32 kbps hierarchical audio coder shown in FIG. 1 was proposed in response to the above project and is described in the ITU-T document COM 16, D135 (WP 3/16), “France Telecom G.729EV Candidate: High level description and complexity evaluation”, Q.10/16, Study Period 2005-2008, Geneva, 26 Jul.-5 Aug. 2005. This coder effects three-layer coding, comprising cascade CELP coding, band expansion by full band linear predictive coding (LPC) and predictive transform coding. TDAC (time domain aliasing cancellation) coding is applied following application of the modified discrete cosine transform (MDCT). The predictive transform coding layer uses a full band perceptually weighted filter ŴWB(z).
The concept of shaping coding noise by perceptually weighted filtering is explained in the aforementioned publication by W. B. Kleijn et al. In substance, perceptually weighted filtering shapes the coding noise by attenuating the signal at the frequency at which the noise intensity is high and at which noise can be masked more easily.
The perceptually weighted filters most widely used in narrow-band CELP coding are of the form Â(z/γ1)/Â(z/γ2) where 0≦γ2≦γ1<1 and Â(z) represents the LPC spectrum of a signal segment with a length of 5 milliseconds (ms) to 30 ms. Thus analysis by synthesis in CELP coding amounts to minimizing the quadratic error in a signal domain weighted perceptually by this type of filter.
However, this technique as proposed in the context of G.729EV standardization has the drawback of using a full band perpetual weighting filter. The associated filtering is relatively complex in terms of calculation time.