In the field of compression coding, many coders model a signal of L samples using a number of pulses very much less than the total number of samples. This is the case of certain audio-frequency coders, for example, such as the “TDAC” audio coder described in particular in the published document US-2001/027393, in which modified normalized discrete cosine transform coefficients in each band are quantized by vectorial quantifiers using algebraic dictionaries of interleaved size, these algebraic codes generally including a few components that are non-zero, the other components being equal to zero. This is also the case with most speech coders using analysis by synthesis, in particular coders of the Algebraic Code Excited Linear Prediction (ACELP), Multi-Pulse Maximum Likelihood Quantization (MP-MLQ) and other types. To model the innovation signal, these coders use a directory composed of waveforms having very few components that are non-zero, having positions and amplitudes that additionally obey predetermined rules.
Coders of the above kind using analysis by synthesis are briefly described below.
In coders using analysis by synthesis, a synthesis model is used on coding to extract parameters modeling the signals to be coded, which may be sampled at the telephone frequency (Fe=8 kilohertz (kHz)) or at a higher frequency, for example at 16 kHz for broadened band coding (passband from 50 hertz (Hz) to 7 kHz). Depending on the application and on the required quality, the compression rate varies from 1 to 16. These coders operate at bit rates from 2 kilobits per second (kbps) to 16 kbps in the telephone band and from 6 kbps to 32 kbps in the broadened band.
There follows a brief description of the CELP digital codec, which codec uses analysis by synthesis and is the one most widely used at present for coding/decoding speech signals. A speech signal is sampled and converted into a series of blocks of L′ samples called frames. As a general rule, each frame is divided into smaller blocks of L samples called subframes. Each block is synthesized by filtering a waveform extracted from a directory (also called a dictionary) multiplied by a gain via two filters varying in time. The excitation dictionary is a finite set of waveforms of L samples. The first filter is a long-term prediction (LTP) filter. An LTP analysis evaluates the parameters of this LTP filter, which exploits the periodic nature of voiced sounds (typically representing the frequency of the fundamental pitch (the vibration frequency of the vocal chords)). The second filter is a short-term prediction filter. Linear prediction coding (LPC) analysis methods are used to obtain short-term prediction parameters representing the transfer function of the vocal tract and characteristic of the spectrum of the signal (typically representing the modulation resulting from the shape assumed by the lips, the positions of the tongue and of the larynx, etc.).
The method used to determine the innovation sequence is the method known as analysis by synthesis. In the coder, a large number of innovation sequences from the excitation dictionary are filtered by the LTP and LPC filters and the waveform producing the synthetic signal closest to the original signal according to a perceptual weighting criterion, generally known as the CELP criterion, is selected.
The use of multipulse dictionaries in these analysis by synthesis coders is described briefly below, on the understanding that CELP coders and CELP decoders are well known to the person skilled in the art.
The multiple bit rate coder of the ITU-T G.723.1 Standard is a good example of a coder using analysis by synthesis that employs multipulse dictionaries. Here, the pulse positions are all separate. The two bit rates of the coder (6.3 kbps and 5.3 kbps) model the innovation signal by means of waveforms extracted from the dictionary that include only a small number of non-zero pulses: six or five for the high bit rate, four for the low bit rate. These pulses are of amplitude +1 or −1. In its 6.3 kbps mode, the G.723.1 coder uses two dictionaries alternately:                in the first dictionary, used for even subframes, the waveforms comprise six pulses, and        in the second dictionary, used for odd subframes, they comprise five pulses.        
In both dictionaries, a single restriction is imposed on the positions of the pulses of any code-vector, which must all have the same parity, i.e. they must all be even or they must all be odd. In the 5.3 kbps mode dictionary, the positions of the four pulses are more severely constrained. Apart from the same parity constraint as the dictionaries of the high bit rate mode, there is a limited choice of positions for each pulse.
The 5.3 kbps mode multipulse dictionary belongs to the well-known family of ACELP dictionaries. The structure of an ACELP directory is based on the interleaved single-pulse permutation (ISPP) technique, which consists in dividing a set of L positions into K interleaved tracks, the N pulses being located in certain predefined tracks. In some applications, the dimension L of the code words can be expanded to L+N. Accordingly, in the case of the low bit rate mode directory of an ITU-T G.723.1 coder, the dimension of the block of 60 samples is expanded to 64 samples and the 32 even (or odd as the case may be) positions are divided into four non-overlapping interleaved tracks of length 8. There are therefore two groups of four tracks, one for each parity. Table 1 below sets out the four tracks for the even positions for each pulse i0 to i3.
TABLE 1Positions and amplitudes of the pulses of theACELP dictionary of the 5.3 kbps mode G.723.1 coderPulseSignPositioni0±10, 8, 16, 24, 32, 40, 48, 56i1±12, 10, 18, 26, 34, 42, 50, 58i2±14, 12, 20, 28, 36, 44, 52, (60)i3±16, 14, 22, 30, 38, 46, 54, (62)
The ACELP innovation dictionaries are used in many standardized coders employing analysis by synthesis (ITU-T G.723.1, ITU-T G.729, IS-641, 3GPP NB-AMR, 3GPP WB-AMR). Tables 2 to 4 below set out a few examples of these ACELP dictionaries for a block length of 40 samples. Note that the parity constraint is not used in these dictionaries. Table 2 covers the ACELP dictionary for 17 bits and four non-zero pulses of amplitude ±1, used in the 8 kbps mode ITU-T G.729 coder, the IS-641 7.4 kbps mode coder and the 7.4 and 7.95 kbps mode 3GPP NB-AMR coder.
TABLE 2Positions and amplitudes of the pulses of theACELP dictionary of the 8 kbps mode ITU-T G.729 coder,7.4 kbps mode IS-641 coder and 7.4 and 7.95 kbps mode3GPP NB-AMR coderPulseSignPositioni0±10, 5, 10, 15, 20, 25, 30, 35i1±11, 6, 11, 16, 21, 26, 31, 36i2±12, 7, 12, 17, 22, 27, 32, 37i3±13, 8, 13, 18, 23, 28, 33, 384, 9, 14, 19, 24, 29, 34, 39
Table 3 covers the ACELP dictionary for 35 bits used in the 12.2 kbps mode 3GPP NB-AMR coder, in which each code-vector contains 10 non-zero pulses of amplitude ±1. The block of 40 samples is divided into five tracks of length 8 each containing two pulses. Note that the two pulses of the same track can overlap and result in a single pulse of amplitude ±2.
TABLE 3Positions and amplitudes of the pulses of theACELP dictionary of the 12.2 kbps mode 3GPP NB-AMR coderPulseSignPositioni0, i5±10, 5, 10, 15, 20, 25, 30, 35i1, i6±11, 6, 11, 16, 21, 26, 31, 36i2, i7±12, 7, 12, 17, 22, 27, 32, 37i3, i8±13, 8, 13, 18, 23, 28, 33, 38i4, i9±14, 9, 14, 19, 24, 29, 34, 39
Finally, Table 4 covers the ACELP dictionary for 11 bits and two non-zero pulses of amplitude ±1 used in the low bit rate (6.4 kbps) extension of the ITU-T G.729 coder and in the 5.9 kbps mode 3GPP NB-AMR coder.
TABLE 4Positions and amplitudes of the pulses of theACELP dictionary of the 6.4 kbps mode ITU-T G.729 coderand the 5.9 kbps mode 3GPP NB-AMR coderPulseSignPositionsi0±11, 3, 6, 8, 11, 13, 16, 18, 21,23, 26, 28, 31, 33, 36, 38i1±10, 1, 2, 4, 5, 6, 7, 9, 10, 11,12, 14, 15, 16, 17, 19, 20, 21,22, 24, 25, 26, 27, 29, 30, 31,32, 34, 35, 36, 37, 39
What is meant by “exploring” multipulse dictionaries is explained below.
As with any quantizing operation, seeking the optimum modeling of a vector to be coded consists in selecting from the set (or a subset) of the code-vectors of the dictionary that which “resembles” it most closely, i.e. the one that minimizes the measured distance between it and that input vector. A step referred to as “exploring” the dictionaries is carried out for this purpose.
In the case of multipulse dictionaries, this amounts to seeking the combination of pulses that optimizes the proximity of the signal to be modeled and the signal resulting from the choice of pulses. Depending on the size and/or the structure of the dictionary, this exploration may be exhaustive or non-exhaustive (and therefore more or less complex).
Since the dictionaries used in the TDAC coder referred to above are unions of permutation codes of type II, the algorithm for coding a vector of normalized transform coefficients exploits this property to determine its nearest neighbor from all the code-vectors, calculating only a limited number of distance criteria (using so-called “absolute leader” vectors).
In coders employing analysis by synthesis, the exploration of the multipulse dictionaries is not exhaustive except in the case of small dictionaries. Only a small percentage of dictionaries of higher bit rate is explored. For example, multipulse ACELP dictionaries are generally explored in two stages. To simplify this search, a first stage preselects the amplitude (and therefore the sign, see above) of each possible pulse position by simply quantizing a signal depending on the input signal. Since the amplitudes of the pulses are fixed, it is the positions of the pulses that are then searched for using an analysis by synthesis technique (conforming to the CELP criterion). Despite using the ISPP structure, and despite the small number of pulses, an exhaustive search of the combinations of positions is effected only for the low bit rate dictionaries (typically less than or equal to 12 bits). This applies to the 11-bit ACELP dictionary used in the 6.4 kbps mode G.729 coder (see Table 4), for example, in which the 512 combinations of positions of two pulses are all tested to select the best one, which amounts to calculating the corresponding 512 CELP criteria.
Various focusing methods have been proposed for dictionaries of higher bit rate. The expression “focused search” is then used.
Some of those prior art methods are used in the standardized coders mentioned above. Their aim is to reduce the number of combinations of positions to be explored on the basis of the properties of the signal to be modeled. One example is the “depth-first tree” algorithm used by many standardized ACELP coders, in which preference is given to certain positions, such as the local maxima of the tracks of a target signal depending on the input signal, the past synthetic signal, and a filter composed of synthesis and perceptual weighting filters. There are several variants of this, depending on the size of the dictionary used. To explore the ACELP dictionary for 35 bits and 10 pulses (see Table 3), the first pulse is placed at the same position as the global maximum of the target-signal. This is followed by four iterations by circular permutation of the consecutive tracks. On each iteration, the position of the second pulse is fixed at the local maximum of one of the other four tracks, and the positions of the remaining other eight pulses are searched for sequentially in pairs in interleaved loops. 256 (8×8×4 pairs) different combinations are tested on each iteration, which means that only 1024 combinations of positions of the 10 pulses among the 225 of the dictionary can be explored. A different variant is used in the IS641 coder, in which a higher percentage of combinations of the dictionary for 17 bits and four pulses (see Table 2) is explored. 768 combinations of the 8192 (=213) combinations of pulse positions are tested. In the 8 kbps G.729 coder, the same ACELP dictionary is explored by a different focusing method. The algorithm effects an iterative search by interleaving four pulse search loops (one per pulse). The search is focused by making entry into the interior loop (search for the last pulse belonging to tracks 3 or 4) conditional on exceeding an adaptive threshold that also depends on the properties of the target-signal (local maximum values and mean values of the first three tracks). Moreover, the maximum number of explorations of combinations of four pulses is fixed at 1440 (which represents 17.6% of the 8192 combinations).
In the 6.3 kbps mode G.723.1 coder, not all the 2×25×C305 (or 2×26×C306) combinations of five (or six) pulses are explored. For each chart, the algorithm employs a known “multipulse” analysis to search sequentially for the positions and the amplitudes of the pulses. As with the ACELP dictionaries, there are variants that restrict the number of combinations tested.
The above techniques suffer from the following problems, however.
The exploration of a multipulse dictionary, even a sub-optimum exploration thereof, constitutes in many coders a costly operation in terms of calculation time. For example, in the 6.3 kbps mode G.723.1 and 8 kbps mode G.729 coders, the search represents close to half the total complexity of the coder. For the NB-AMR coder, it represents one third of the total complexity. For the TDAC coder, it represents one quarter of the total complexity.
It is clear in particular that this complexity becomes critical if a plurality of coding operations have to be carried out by the same processor unit, such as a gateway managing many calls in parallel or a server distributing many multimedia contents. The complexity problem is accentuated by the multiplicity of compression formats circulating on the networks.
To offer mobility and continuity, modern and innovative multimedia communications services must be able to operate under a wide variety of conditions. The dynamism of the multimedia communications sector and the heterogeneous nature of the networks, access points and terminals have generated a plethora of compression formats whose presence in communications systems necessitates multiple coding either in cascade (transcoding) or in parallel (multiformat coding or multimode coding).
The meaning of the term “transcoding” is explained below. Transcoding becomes necessary if, in a transmission system, a compressed signal frame sent by a coder can no longer proceed in the same format. Transcoding converts the frame to another format compatible with the remainder of the transmission system. The most elementary solution (and therefore that in most widespread use at present) is to place a decoder and a coder back to back. The compressed frame arrives with a first format and is decompressed. The decompressed signal is then compressed with a second format accepted by the remainder of the communications system. Such a cascade of a decoder and a coder is referred to as “tandem”. That solution is very costly in terms of complexity (essentially because of the recoding) and degrades quality because the second coding is effected on a decoded signal, which is a degraded version of the original signal. Moreover, a frame may encounter several tandems before reaching its destination. The calculation cost and the loss of quality are not difficult to imagine. Moreover, the delays linked to each tandem operation are cumulative and can compromise the interactivity of calls.
What is more, complexity also causes problems in a multiformat compression system in which the same content is compressed to more than one format. This is the case of content servers that broadcast the same content in a plurality of formats adapted to the access conditions, networks and terminals of different customers. This multicoding operation becomes extremely complex as the number of formats required increases, which rapidly saturates the resources of the system.
Another case of multiple coding in parallel is a posteriori decision multimode compression. A plurality of compression modes are applied to each segment of the signal to be coded, and that which optimizes a given criterion or achieves the best bit rate/distortion trade-off is selected. Once again, the complexity of each of the compression modes limits the number thereof and/or leads to an a priori selection of a very small number of modes.
Prior art approaches to solving the above problems are described below.
New multimedia communications applications (such as audio and video applications) often necessitate a plurality of coding operations either in cascade (transcoding) or in parallel (multicoding and a posteriori decision multimode coding). The problem of the complexity barrier resulting from all these coding operations remains to be solved, despite the increase in current processing powers. Most prior art multiple coding operations do not take account of interactions between formats and between the format of the coder E and its content. Nevertheless, a few intelligent transcoding techniques have been proposed that are not satisfied merely by decoding and then recoding, but instead exploit the similarities between coding formats so that complexity can be reduced whilst limiting the resulting degradation.
So-called “intelligent” transcoding methods are described below.
All the coders in the same family of coders (CELP, parametric, transform, etc.) extract the same physical parameters from the signal. There is nevertheless great variety in terms of modeling and/or quantizing those parameters. Thus the same parameter may be coded in the same way or very differently from one coder to another.
Moreover, the coding may be strictly identical, or it may be identical in terms of modeling and calculation of the parameter, but differ simply in how the coding is translated into the form of bits. Finally, the coding may be completely different in terms of modeling and quantizing the parameter, or even in terms of its analysis or sampling frequency.
If modeling and parameter calculation are strictly identical, including translation to bit form, it suffices to copy the corresponding bit field from the bit stream generated by the first coder to that of the second. This highly favorable situation arises on transcoding from the G.729 standard to the IS-641 standard for adaptive excitation (LTP delays), for example.
If, for the same parameter, the two coders differ only in terms of the translation of the calculated parameter into bit form, it suffices to decode the bit field of the first format and then to return it to the binary domain using the coding method of the second format. This conversion may also be effected by means of one-to-one correspondence tables. This is the situation when transcoding fixed excitations from the G.729 standard to the AMR standard (7.4 kbps and 7.95 kbps modes), for example.
In the above two situations, transcoding the parameter remains at the bit level. Simple bit manipulation renders the parameter compatible with the second coding format. On the other hand, if a parameter extracted from the signal is modeled or quantized differently by two coding formats, passing from one to the other is not such a simple matter. Several methods have been proposed. They operate at the parameter level, the excitation level, or the decoded signal level.
For transcoding in the parameter domain, remaining at the parameter level is possible if the two coding formats calculate a parameter in the same way but quantize it differently. Quantizing differences may be related to the accuracy or the method selected (scalar, vectorial, predictive, etc.). It then suffices to decode the parameter and then to quantize it using the method of the second coding format. That prior art method is used at present for transcoding excitation gains in particular. The decoded parameter must often be modified before it is requantized. For example, if the coders have different parameter analysis frequencies or different frame/subframe lengths, it is standard practice to interpolate/decimate the parameters. Interpolation may be effected by the method described in the published document US2003/033142, for example. Another modification option is to round off the parameter to the accuracy imposed on it by the second coding format. This situation is encountered for the most part for the height of the fundamental frequency (“pitch”).
If it is not possible to transcode a parameter within the parameter domain, decoding can go to a higher level. This is the excitation domain, without going so far as the signal domain. This technique has been proposed for gains in the document “Improving transcoding capability of speech coders in clean and frame erasured channel environments”, Hong-Goo Kang, Hong Kook Kim, Cox, R. V., Speech Coding, 2000, Proceedings 2000, IEEE Workshop on Speech Coding, Pages 78-80.
Finally, a last solution (the most complex and the least “intelligent”) consists in recalculating the parameter explicitly, as the coder would, but based on a synthesized signal. This operation amounts to a kind of partial tandem, with only some parameters being entirely recalculated. This method has been applied to diverse parameters such as the fixed excitation, the gains in the IEEE reference cited above, or the pitch.
For transcoding pulses, although several techniques have been developed to calculate the parameters quickly and at lower cost, few solutions available today use an intelligent approach to calculating the pulses of one format from the equivalent parameter in another format. In coding using analysis by synthesis, intelligent transcoding of pulse codes is applied only if the modeling is identical (or close). In contrast, if the modeling is different, the partial tandem method is used. Note that to limit the complexity of this operation, focused approaches have been proposed that exploit the properties of the decoded signal or a derived signal such as a target-signal. In the document US-2001/027393 cited above, in an embodiment utilizing an MDCT transform coder, there is described a bit rate change procedure that may be considered a special case of intelligent transcoding. That procedure requantizes a vector from a first dictionary using a vector from a second dictionary. To this end it distinguishes between two situations depending on whether the vector to be requantized belongs to the second dictionary or not. If the quantized vector belongs to the new dictionary, the modeling is identical; if not, the partial decoding method is applied.
Setting itself apart from all the above prior art techniques, the present invention proposes a method of multipulse transcoding based on selecting a subset of combinations of pulse positions of an ensemble of sets of pulses from a combination of pulse positions of another ensemble of sets of pulses, the two ensembles being distinguished by the numbers of pulses that they include and by rules governing their positions and/or their amplitudes. This form of transcoding is very beneficial for multiple coding in cascade (transcoding) or in parallel (multicoding and multimode coding) in particular.