The invention relates to devices for coding and decoding audio signals, intended in particular to sit within applications of transmission or storage of digitized and compressed audio signals (speech and/or sounds).
More particularly, this invention pertains to audio coding systems having the capacity to provide varied bit rates, also referred to as multirate coding systems. Such systems are distinguished from fixed rate coders by their capacity to modify the bit rate of the coding, possibly during processing, this being especially suited to transmission over heterogeneous access networks: be they networks of IP type mixing fixed and mobile access, high bit rates (ADLS), low bit rates (RTC, GPRS modems) or involving terminals with variable capacities (mobiles, PCs, etc.).
Essentially, two categories of multirate coders are distinguished: that of “switchable” multirate coders and that of “hierarchical” coders.
“Switchable” multirate coders rely on a coding architecture belonging to a technological family (temporal coding or frequency coding, for example: CELP, sinusoidal, or by transform), in which an indication of bit rate is simultaneously supplied to the coder and to the decoder. The coder uses this information to select the parts of the algorithm and the tables relevant to the bit rate chosen. The decoder operates in a symmetric manner. Numerous switchable multirate coding structures have been proposed for audio coding. Such is the case for example with mobile coders standardized by the 3GPP organization (“3rd Generation Partnership Project”), NB-AMR (“Narrow Band Adaptive Multirate”, Technical Specification 3GPP TS 26.090, version 5.0.0, June 2002) in the telephone band, or WB-AMR (“Wide Band Adaptive Multirate”, Technical Specification 3GPP TS 26.190, version 5.1.0, December 2001) in wideband. These coders operate over fairly wide bit rate ranges (4.75 to 12.2 kbit/s for NB-AMR, and 6.60 to 23.85 kbit/s for WB-AMR), with a fairly sizeable granularity (8 bit rates for NB-AMR and 9 for WB-AMR) . However, the price to be paid for this flexibility is a rather considerable complexity of structure: to be able to host all these bit rates, these coders must support numerous different options, varied quantization tables etc. The performance curve increases progressively with bit rate, but the progress is not linear and certain bit rates are in essence better optimized than others.
In so-called “hierarchical” coding systems, also referred to as “scalable”, the binary data arising from the coding operation are distributed into successive layers. A base layer, also called the “kernel”, is formed of the binary elements that are absolutely necessary for the decoding of the binary train, and determine a minimum quality of decoding.
The subsequent layers make it possible to progressively improve the quality of the signal arising from the decoding operation, each new layer bringing new information which, utilized by the decoder, supplies a signal of increasing quality at output.
One of the particular features of hierarchical coding is the possibility offered of intervening at any level whatsoever of the transmission or storage chain so as to delete a part of the binary train without having to supply any particular indication to the coder or to the decoder. The decoder uses the binary information that it receives and produces a signal of corresponding quality.
The field of hierarchical coding structures has given rise likewise to much work. Certain hierarchical coding structures operate on the basis of one type of coder alone, designed to deliver hierarchized coded information. When the additional layers improve the quality of the output signal without modifying the bandwidth, one speaks rather of “embedded coders” (see for example R. D. Lacovo et al., “Embedded CELP Coding for Variable Bit-Rate Between 6.4 and 9.6 kbit/s, Proc. ICASSP 1991, pp. 681-686). Coders of this type do not however allow large gaps between the lowest and the highest bit rate proposed.
The hierarchy is often used to progressively increase the bandwidth of the signal: the kernel supplies a baseband signal, for example telephonic (300-3400 Hz), and the subsequent layers allow the coding of additional frequency bands (for example, wide band up to 7 kHz, HiFi band up to 20 kHz or intermediate, etc.). The subband coders or coders using a time/frequency transformation such as described in the documents “Subband/transform coding using filter banks designs based on time domain aliasing cancellation” by J. P. Princen et al. (Proc. IEEE ICASSP-87, pp. 2161-2164) and “High Quality Audio Transform Coding at 64 kbit/s”, by Y. Mahieux et al. (IEEE Trans. Commun., Vol. 42, No. 11, November 1994, pp. 3010-3019), lend themselves particularly to such operations.
Moreover, a different coding technique is frequently used for the kernel and for the module or modules coding the additional layers, one then speaks of various coding stages, each stage consisting of a subcoder. The subcoder of the stage of a given level will be able either to code parts of the signal that are not coded by the previous stages, or to code the coding residual of the previous stage, the residual is obtained by subtracting the decoded signal from the original signal.
The advantage of such structures it that they make it possible to go down to relatively low bit rates with sufficient quality, while producing good quality at high bit rate. Specifically, the techniques used for low bit rates are not generally effective at high bit rates and vice versa.
Such structures making it possible to use two different technologies (for example CELP and time/frequency transform, etc.) are especially effective for sweeping large bit rate ranges.
However, the hierarchical coding structures proposed in the prior art define precisely the bit rate allocated to each of the intermediate layers. Each layer corresponds to the encoding of certain parameters, and the granularity of the hierarchical binary train depends on the bit rate allocated to these parameters (typically a layer can contain of the order of a few tens of bits per frame, a signal frame consisting of a certain number of samples of the signal over a given duration, the example described later considering a frame of 960 samples corresponding to 60 ms of signal).
Moreover, when the bandwidth of the decoded signals can vary according to the level of the layers of binary elements, the modification of the line bit rate may produce artifacts that impede listening.