To obtain a reduced bit rate in a transform-based coding scheme it is usual to attempt to reduce the accuracy devoted to the sample coding process, while ensuring minimum perceptible degradation. To this end, the reduction in quantization accuracy is controlled by using perceptive weightings. Based for example on known properties of the human eye (for video coding) or the human ear (for audio coding), this approach makes it possible to place the quantization noise in the frequency bands that are the least perceptible.
To use information from a psycho-visual or psycho-acoustic model, essentially in the frequency domain, it is standard practice to apply a time/frequency transform, quantization being effected in the frequency domain.
FIG. 1 is a diagram illustrating the structure of a transform-based coder, with:                a bank BA of analysis filters FA1 to FAn receiving an input signal X;        a quantization block Q (including band quantization modules Q1, . . . , Qn) followed by a coding block COD including coding modules COD1 to CODn; and        a bank BS of synthesis filters FS1, . . . , FSn delivering the coded signal X′.        
To reduce the bit rate further before transmission, the quantized frequency-domain samples are coded, often by an entropic (lossless) coding process. The quantization is effected by a uniform or non-uniform scalar quantizer or a vector quantizer, in the standard manner.
The noise introduced in the quantization step is shaped by the bank of synthesis filters (this process is known as applying an inverse transform). The inverse transform, which is linked to the analysis transform, must therefore be chosen to concentrate the quantization noise in the frequency or time domain to prevent it from becoming perceptible.
The analysis transform must concentrate the energy of the signal optimally in order to facilitate coding of the samples in the transformed domain. This process is referred to as energy compaction. In particular, the coding gain of the analysis transform, which depends on the input signal, must be maximized. An equation of the following type is used for this, in which K is a constant that can advantageously have the value 6.02 and R is the number of bits in each selected sample:SNR=GTC+K·R  (1)
Thus the signal/noise ratio (SNR) obtained is proportional to R plus the component GTC that represents the coding gain of the transform.
The higher the coding gain, the better the quality of reconstruction. The importance of the transform used for the coding process is therefore clear. It facilitates coding the samples, by means of its ability to concentrate both the energy of the signal (by means of the analysis part) and the quantization noise (by means of the synthesis part).
Audio and video signals being notoriously non-steady, the time-frequency transform must be adapted over time as a function of the nature of the input signal of the bank of filters.
A few applications to the usual coding techniques are described below.
With modulated transforms, the normalized audio coding techniques integrate banks of cosine-modulated filters that enable implementation of these coding techniques using fast algorithms based on cosine transforms or fast Fourier transforms.
The transform of this type most commonly used (in particular in MP3, MPEG-2, and MPEG-4 AAC coding) is the Modified Discrete Cosine Transform (MDCT), an expression for which is as follows:
      X    k    t    =                    ∑                  n          =          0                                      2            ⁢            M                    -          1                    ⁢                        x                      n            +            tM                          ⁢                              p            k                    ⁡                      (            n            )                          ⁢        0              ≤    k    <    M  in which:                M is the size of the transform;        xn+tM are the samples of the digitized signal with a period        
  1      F    e  (the reciprocal of the sampling frequency) at time n+tM;                t is a frame index;        Xkt are the samples in the transformed domain for the frame t;        
            p      k        ⁡          (      n      )        =                              h          a                ⁡                  (          n          )                    ⁢              C                  n          ,          k                      =                            2          M                    ⁢                        h          a                ⁡                  (          n          )                    ⁢              cos        [                              π                          4              ⁢              M                                ⁢                      (                                          2                ⁢                n                            +              1              +              M                        )                    ⁢                      (                                          2                ⁢                k                            +              1                        )                          ]            is a basic function of the transform in which:                the term ha(n) is called the prototype filter or analysis weighting window and covers 2M samples; and        the term Cn,k defines the modulation.        
This transform is applied to audio processing. It is also applied to video processing, in particular in fixed image coding, where the transform is successively applied to the rows and the columns in the standard manner. This principle furthermore extends to signals with more than two dimensions.
To restore the initial time samples, the following inverse transform is applied on decoding in order to reconstitute the 0≦n<M samples that are then situated in an area of overlap of two consecutive transforms. The decoded samples are then given by the following equation in which pks(n)=hs(n)Cn,k defines the synthesis transform, the synthesis weighting window being denoted hs(n) and also covering 2M samples:
            x      ^              n      +      tM      +      M        =            ∑              k        =        0                    M        -        1              ⁢          [                                    X            k                          t              +              1                                ⁢                                    p              k              s                        ⁡                          (              n              )                                      +                              X            k            t                    ⁢                                    p              k              s                        ⁡                          (                              n                +                M                            )                                          ]      
The reconstruction equation yielding the decoded samples may also be written in the following form:
                                          x            ^                                n            +            tM            +            M                          =                              ∑                          k              =              0                                      M              -              1                                ⁢                      [                                                            X                  k                                      t                    +                    1                                                  ⁢                                                      h                    s                                    ⁡                                      (                    n                    )                                                  ⁢                                  C                                      k                    ,                    n                                                              +                                                X                  k                  t                                ⁢                                                      h                    s                                    ⁡                                      (                                          n                      +                      M                                        )                                                  ⁢                                  C                                      k                    ,                                          n                      +                      M                                                                                            ]                                                  =                                                            h                s                            ⁡                              (                n                )                                      ⁢                                          ∑                                  k                  =                  0                                                  M                  -                  1                                            ⁢                                                X                  k                                      t                    +                    1                                                  ⁢                                  C                                      k                    ,                    n                                                                                +                                                    h                s                            ⁡                              (                                  n                  +                  M                                )                                      ⁢                                          ∑                                  k                  =                  0                                                  M                  -                  1                                            ⁢                                                X                  k                  t                                ⁢                                  C                                      k                    ,                                          n                      +                      M                                                                                                              
This other presentation of the reconstruction equation amounts to considering that two inverse cosine transforms may be applied successively to the samples Xkt and Xkt+1 in the transform domain, their result then being combined by a weighting and addition operation. This reconstruction method is shown in FIG. 2 in which the samples in the transform domain are denoted Xt,k and the reconstructed samples in the time domain are denoted {circumflex over (x)}n.
An MDCT usually employs identical windows for analysis and synthesis and thus h(n)=ha(n)=hs(n).
To ensure exact (referred to as perfect) reconstruction of the signal (subject to the condition {circumflex over (x)}n+tM=xn+tM, it is necessary to choose a prototype window h(n) satisfying a few constraints.
The following equations are satisfactory for obtaining perfect reconstruction. They are usually adopted for constructing windows suited to the MDCT:
                    {                                                                              h                  ⁡                                      (                                                                  2                        ⁢                        M                                            -                      1                      -                      n                                        )                                                  =                                  h                  ⁡                                      (                    n                    )                                                                                                                                                                                      h                      2                                        ⁡                                          (                      n                      )                                                        +                                                            h                      2                                        ⁡                                          (                                              n                        +                        M                                            )                                                                      =                1                                                                        (        2        )            
The windows are of even symmetry relative to a central sample, as shown in the examples in FIG. 3.
It is relatively simple to satisfy these simple constraints and, to this end, a standard prototype filter may consist of a sinusoidal window (shown in solid line in FIG. 3) that is written as follows:
      h    ⁡          (      n      )        =      sin    ⁡          [                        π                      2            ⁢            M                          ⁢                  (                      n            +            0.5                    )                    ]      
Of course, other forms of prototype filter exist, such as the Kaiser-Bessel-derived (KBD) windows defined in the MPEG-4 standard (corresponding to the dashed-line curves in FIG. 3) and low-overlap windows.
Given the necessity to adapt the transform to the signal to be coded, the prior art techniques enable the transform that is used to be changed over time, with this being referred to below as window switching. It is considered here that on changing the transform, the size of the windows used remains the same so that only the weighting coefficients of the windows change over time.
The expressions given above for a constant window are adapted below to the situation of a change of window. Without loss of generality, overlapping involving only two consecutive frames for the MDCT, the transition for two consecutive frames T1 and T2 is explained as follows. The first frame T1 uses an analysis window ha1 and the second frame T2 uses an analysis window ha2. The synthesis windows used for reconstruction are chosen to be identical to the analysis windows in the overlapping parts of the two windows ha1 and ha2. Thus, for 0≦n<M: ha1(n+M)=hs1(n+M)ha2(n)=hs2(n)
Differing from the previous situation in which the same window is used for a plurality of successive frames, there is no longer a direct relationship between the first and second halves of the analysis windows, which means that the weighting coefficient ha1(n+M) can be independent of the coefficient ha1(n). Similarly, the coefficient ha2(n) can be independent of the coefficient ha2(n+M). Thus it is possible for the shape of the analysis window to be made to evolve over time.
The conditions for perfect reconstruction become, for 0≦n<M:
  {                                                                                          h                                      a                    ⁢                                                                                  ⁢                    1                                    2                                ⁡                                  (                                      n                    +                    M                                    )                                            +                                                h                                      a                    ⁢                                                                                  ⁢                    2                                    2                                ⁡                                  (                  n                  )                                                      =            1                                                                                                                                h                                          a                      ⁢                                                                                          ⁢                      1                                                        ⁡                                      (                                                                  2                        ⁢                        M                                            -                      1                      -                      n                                        )                                                  ·                                                      h                                          a                      ⁢                                                                                          ⁢                      1                                                        ⁡                                      (                                          n                      +                      M                                        )                                                              -                                                                    h                                          a                      ⁢                                                                                          ⁢                      2                                                        ⁡                                      (                                          M                      -                      1                      -                      n                                        )                                                  ·                                                      h                                          a                      ⁢                                                                                          ⁢                      2                                                        ⁡                                      (                    n                    )                                                                        =            0                                   
The very simple standard solution for verifying the above conditions consists in choosing, for 0≦n<M:ha1(n+M)=ha2(M−1−n)
Accordingly, referring to FIG. 4, the analysis window used in the first half of the frame T2 (the dashed-line curve in FIG. 4) is a mirror version of the analysis window used in the second half of the frame T1 (the solid line curve in FIG. 4). In other words, to ensure perfect reconstruction, the prior art teaches progressive transitions via sections sharing the same analysis windows, apart from a mirror effect.
This mirror effect also applies to the synthesis windows by virtue of the imposed equality of the synthesis and analysis windows.
Because of the mirror effect, any insertion of zeros (weighting coefficients with the value 0) after the coefficient ha1(n+M) in the window ha1 has the effect of inserting the same number of zeros at the beginning of the window ha2 (in the term ha2(n)). Moreover, this insertion of zeros implies imposing the same number of coefficients with the value 1 for the inverse ranks M−n. To be more precise:ha1(n+M)=ha2(M−1−n)
Because of this, the general appearance of such a window including many zeros is similar to that of a rectangular window, as shown in FIG. 5. A rectangular window has poor resolution in the frequency domain and a high level of discontinuity. This is a first problem inherent to prior art coders/decoders.
In standardized known coders/decoders, the coder usually selects the transform to be used over time. Thus in the AAC standard, as described in the document “Information technology—Coding of audio-visual objects—Part 3: Audio”, ISO/IEC 14496-3 (2001), the coder selects and sends the window shape corresponding to the second half of the analysis window, the first half being induced by the selection effected for the preceding frame. In the AAC standard, a bit is sent to the decoder in order to enable windows of the same type to be used for synthesis.
The decoder is therefore slaved to the coder and faithfully applies the types of window decided on by the coder.
It is therefore clear that a drawback of the prior art is that, in order to ensure a transition of the type of window that is used over time, it is necessary to introduce an interleaved half-window in order to ensure perfect reconstruction. Thus the analysis windows ha1 and ha2 referred to above cannot be rendered independent of each other on their common medium.
The present invention aims to improve on this situation.