Speech codec is designed specially according to the characteristics of a speech signal [NPL 1]. The speech codec has the advantage of efficiently coding a speech signal. For example, the sound quality is high when a speech signal is coded in low bitrate, and the delay is low. However, the sound quality in coding an audio signal that is wideband compared to the speech signal is not as good as in the case of using some transform codecs such as the AAC scheme. On the other hand, the transform codec represented by the AAC scheme is suitable for coding an audio signal, but it requires higher bitrate to code a speech signal in order to achieve the same sound quality as the speech codec. The hybrid codec can code a speech signal and an audio signal with high sound quality at low bitrate. The hybrid codec combines the merits of the two different codecs in order to achieve coding with high sound quality at low bitrate.
A low delay hybrid codec is desired for real-time communication applications such as a teleconference system. One low delay hybrid codec combines the AAC-LD (low-delay AAC) coding technology with the speech coding technology. The AAC-LD provides a mode with an algorithm delay not exceeding 20 ms. The AAC-LD is derived from the normal AAC coding technology. In order to reduce the algorithm delay, the AAC-LD has some modifications on AAC. Firstly, the frame size of the AAC-LD is reduced to 1024 or 960 time domain samples, and thus the output spectral values of the MDCT filter bank are reduced to 512 and 480 spectral values, respectively. Secondly, in order to reduce the algorithm delay, look-ahead is disabled, and as a result, block switching is not used. Thirdly, a low-overlap window is used to replace the Kaiser-Bessel window used in the window function processing in the normal delay AAC. The low-overlap window is used for efficiently coding transient signals in the AAC-LD. Fourthly, the bit reservoir is minimized or not used at all. Fifthly, the temporal noise shaping and long-term prediction functions are adapted according to the low delay frame size.
Generally, the speech codec is based on linear prediction coding (algebraic code-excited linear prediction (ACELP)) [NPL 1]. For the ACELP coding, a linear prediction analysis is applied on a speech signal, and an algebraic codebook is used to code an excitation signal calculated by the linear prediction analysis. To further improve the sound quality of the ACELP coding, recent speech codec additionally uses the transform coded excitation coding (TCX coding). For the TCX coding, after linear prediction analysis, transform coding is applied on the excitation signal. The Fourier transformed weighted signal is quantized using algebraic vector quantization. Different frame sizes are available for speech codec, for example, 1024 time domain samples, 512 time domain samples, and 256 time domain samples. The coding mode is selected using the closed-loop analysis-by-synthesis method.
A low delay hybrid codec has three different coding modes, namely, the AAC-LD coding mode, the ACELP mode and the TCX mode. Since each mode codes a signal in a different domain and has a different frame size, the hybrid codec needs to have block switching methods for transition frames in which the coding mode switches. An example of the transition frame is illustrated in FIG. 2. For example, a pervious frame is coded in the AAC-ELD mode and a current frame is to be coded in the ACELP mode, the current frame is defined as a transition frame. In the prior art, to switch between different coding modes, the aliasing portion of the previous windowed frame is processed differently compared to the current portion of the current block in the transition frame (PTL 1: International Patent Application Publication WO2010/003532 by Fraunhofer Gesellschaft).
To facilitate the explanation of the present invention in the following sections, the transform and the inverse transform of the AAC-ELD is provided in this background section.
The transform processes of the AAC-ELD mode in the encoder are described as follows:
The number of processed AAC-ELD frames is 4. A frame i-1 is concatenated with three previous frames to form an extended frame with a length of 4N. Here, N is the size of the input frame. That is to say, to code a current picture to be coded, the AAC-ELD mode requires not only a sample of the current frame but also samples of the three frames previous to the current frame.
Firstly, window is applied on the extended frame in the AAC-ELD mode. FIG. 3 illustrates the encoder window shape in the AAC-ELD mode of the encoder. The window in the encoder is defined as wenc. For the convenience of illustration, the encoder window is divided into eight parts, denoted as [w1, w2, w3, w4, w5, w6, w7, w8]. The length of the encoder window is 4N. The encoder window in the AAC-ELD mode is designed to match the low delay filter banks used in the AAC-ELD mode. For the convenience of explanation, one frame is divided into two parts as shown in FIG. 3. For example, the frame i-1 is divided into two vectors [ai-1, bi-1]. Here, ai-1 has N/2 samples, and bi-1 has N/2 samples. Therefore, the encoder window is applied on the vectors denoted as [ai-4, bi-4, ai-3, bi-3, ai-2, bi-2, ai-1, bi-1], to obtain the windowed signal [ai-4w1, bi-4w2, ai-3w3, bi-3w4, ai-2w5, bi-2w6, ai-1w7, bi-1w8].
Next, the low delay filter banks are used to transform the windowed signals. The low delay filter banks are defined as following:
                              x          k                =                              -            2                    ⁢                                    ∑                              n                =                                                      -                    2                                    ⁢                  N                                                                              2                  ⁢                  N                                -                1                                      ⁢                                          x                n                            ⁢                              cos                ⁡                                  [                                                            π                      N                                        ⁢                                          (                                              n                        +                                                  1                          2                                                -                                                  N                          2                                                                    )                                        ⁢                                          (                                              k                        +                                                  1                          2                                                                    )                                                        ]                                                                                        [                  Math          .                                          ⁢          1                ]            
where xn=[ai-4w1, bi-4w2, ai-3w3, bi-3w4, ai-2w5, bi-2w6, ai-1w7, bi-1w8].
According to the above low delay filter banks, the length of the output coefficients is N while the processing frame length is 4N.
The low delay filter bank can be expressed in terms of DCT-IV. The DCT-IV definition is shown as follows:
                              x          k                =                              DCT            -                          IV              ⁡                              (                                  x                  n                                )                                              =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          x                n                            ⁢                              cos                ⁡                                  [                                                            π                      N                                        ⁢                                          (                                              n                        +                                                  1                          2                                                                    )                                        ⁢                                          (                                              k                        +                                                  1                          2                                                                    )                                                        ]                                                                                        [                  Math          .                                          ⁢          2                ]            
According to the following identities:
                              cos          ⁡                      [                                          π                N                            ⁢                              (                                                      -                    n                                    -                  1                  +                                      1                    2                                                  )                            ⁢                              (                                  k                  +                                      1                    2                                                  )                                      ]                          =                  cos          ⁡                      [                                          π                N                            ⁢                              (                                  n                  +                                      1                    2                                                  )                            ⁢                              (                                  k                  +                                      1                    2                                                  )                                      ]                                              [                  Math          .                                          ⁢          3                ]                                          cos          ⁡                      [                                          π                N                            ⁢                              (                                                      2                    ⁢                    N                                    -                  n                  -                  1                  +                                      1                    2                                                  )                            ⁢                              (                                  k                  +                                      1                    2                                                  )                                      ]                          =                  -                      cos            ⁡                          [                                                π                  N                                ⁢                                  (                                      n                    +                                          1                      2                                                        )                                ⁢                                  (                                      k                    +                                          1                      2                                                        )                                            ]                                                          [                  Math          .                                          ⁢          4                ]            
the signal of the frame i-1 transformed by the low delay filter banks can be expressed in term of DCT-IV as follows:[DCT-IV(−(ai-4w1)R−bi-4w2+(ai-2w5)R+bi-2w6),DCT-IV(−ai-3w3+(bi-3w4)R+ai-1w7−(bi-1w8)R)],
where (ai-4w1)R, (ai-2w5)R, (bi-3w4)R, (bi-1w8)R denote the reverse order of vectors ai-4w1, ai-2w5, bi-3w4, bi-1w8 respectively.
The inverse transform processes in the AAC-ELD mode of the decoder are described below.
The following describes the case where the decoder decodes the frame i-1 in the AAC-ELD mode. FIG. 7 illustrates the inverse transform processes in the AAC-ELD mode. The inverse low delay filter banks of the AAC-ELD mode in the decoder are shown below.
                                          y            n                    =                                    -                              1                N                                      ⁢                                          ∑                                  k                  =                  0                                                  N                  -                  1                                            ⁢                                                x                  k                                ⁢                                  cos                  ⁡                                      [                                                                  π                        N                                            ⁢                                              (                                                  n                          +                                                      1                            2                                                    -                                                      N                            2                                                                          )                                            ⁢                                              (                                                  k                          +                                                      1                            2                                                                          )                                                              ]                                                                                      ,                                  ⁢                  0          ≤          n          <                      4            ⁢            N                                              [                  Math          .                                          ⁢          5                ]            
The length of the inverse transform signals of the low delay filter banks is 4N. As explained in Embodiment 1, the inverse transform signals for the frame i-1 are as follows:yi-1=[−ai-4w1−(bi-4w2)R+ai-2w5+(bi-2w6)R,−(ai-4w1)R−bi-4w2+(ai-2w5)R+bi-2w6,−ai-3w3+(bi-3w4)R+ai-1w7−(bi-1w8)R,(ai-3w3)R−bi-3w4−(ai-1w7)R+bi-1w8,ai-4w1+(bi-4w2)R−ai-2w5−(bi-2w6)R,(ai-4w1)R+bi-4w2−(ai-2w5)R−bi-2w6,ai-3w3−(bi-3w4)R−ai-1w7+(bi-1w8)R,−(ai-3w3)R+bi-3w4+(ai-1w7)R−bi-1w8]  [Math. 6]
After applying inverse low delay filter banks, window is applied on yi-1 to obtain yi-1.  [Math. 7]FIG. 6 illustrates the decoder window shape in the AAC-ELD mode. The length of the window in the AAC-ELD mode is 4N. It is the reverse order of the encoder window in the AAC-ELD mode. The window in the decoder is denoted as wdec. For the convenience of illustration, the decoder window is divided into eight parts [wR,8, wR,7, wR,6, wR,5, wR,4, wR,3, wR,2, wR,1] as shown in FIG. 6.
The windowed inverse transform signals yi-1  [Math. 8]are as follows: yi-1=[(−ai-4w1−(bi-4w2)R+ai-2w5+(bi-2w6)R)wR,8,(−(ai-4w1)R−bi-4w2+(ai-2w5)R+bi-2w6)wR,7,(−ai-3w3+(bi-3w4)R+ai-1w7−(bi-1w8)R)wR,6,((ai-3w3)R−bi-3w4−(ai-1w7)R+bi-1w8)wR,5,(ai-4w1+(bi-4w2)R−ai-2w5−(bi-2w6)R)wR,4,((ai-4w1)R−bi-4w2−(ai-2w5)R−bi-2w6)wR,3,(ai-3w3−(bi-3w4)R−ai-1w7+(bi-1w8)R)wR,2,(−(ai-3w3)R+bi-3w4+(ai-1w7)R−bi-1w8)wR,1]  [Math. 9]
For the next frame i coded in the AAC-ELD mode, the windowed inverse transform signals yi  [Math. 10]are as follows: yi=[(−ai-3w1−(bi-3w2)R+ai-1w5+(bi-1w6)R)wR,8,(−(ai-3w1)R−bi-3w2+(ai-1w5)R+bi-1w6)wR,7,(−ai-2w3+(bi-2w4)R+aiw7−(biw8)R)R)wR,6,((ai-2w3)R−bi-2w4−(aiw7)R+biw8)wR,5,ai-3w1+(bi-3w2)R−ai-1w5−(bi-1w6)R)wR,4,((ai-3w1)R+bi-3w2−(ai-1w5)R−bi-1w6)wR,3,(ai-2w3−(bi-2w4)R−aiw7+(biw8)R)wR,2,(−(ai-2w3)R+bi-2w4+(aiw7)R−biw8)wR,1]  [Math. 11]
In order to reconstruct the signal [ai-1, bi-1] of the frame i, the overlapping and adding process requires three previous frames. FIG. 7 illustrates the overlapping and adding process in the AAC-ELD mode. The length of the reconstructed signals outi is N.
The overlapping and adding processes can be expressed as the following equation:outi,n= yi,n+ yi-1,n+N+ yi-2,n+2N+ yi-3,n+3N,0≦n<N  [Math. 12]
The aliasing cancellation mechanism of the AAC-ELD is illustrated in FIG. 22. The windowed inverse transform signal of the frame i, the frame i-1, the frame i-2, and the frame i-3 are shown in FIG. 22. For the purpose of visualization, the graphs show an example of a special case whereai=1,bi=1∀i.  [Math. 13](−ai-3w1−(bi-3w2)R+ai-1w5+(bi-1w6)R)wR,8+(−ai-3w3+(bi-3w4)R+ai-1w7−(bi-1w8)R)wR,6+(ai-5w1+(bi-5w2)R−ai-3w5−(bi-3w6)R)wR,4+(ai-5w3−(bi-5w4)R−ai-3w7+(bi-3w8)R)wR,2=ai-5(w3wR,2+w1wR,4)+ai-3(−w7wR,2−w5wR,4−w3wR,6−w1wR,8)+ai-1(w7wR,6+w5wR,8)  [Math. 14]
The window is designed to possess the following properties:(w3wR,2+w1wR,4)R≈0(−w7wR,2−w5wR,4−w3wR,6−w1wR,8)R≈0(w7wR,6+w5wR,8)R≈1  [Math. 15]
A signal ai-1 is reconstructed after the overlapping and adding.
The same analysis method is used to reconstruct a signal bi-1.(−(ai-3w1)R−bi-3w2+(ai-1w5)R+bi-1w6)wR,7+((ai-3w3)R−bi-3w4−(ai-1w7)R+bi-1w8)wR,5+((ai-5w1)R+bi-5w2−(ai-3w5)R−bi-3w6)wR,3+(−(ai-5w3)R+bi-5w4+(ai-3w7)R−bi-3w8)wR,1=bi-5(w2wR,3+w4wR,1)+bi-3(−w2wR,7−w4wR,5−w6wR,3−w8wR,1)+bi-1(w6wR,7+w8wR,5)  [Math. 16](w3wR,2+w1wR,4)R≈0(−w7wR,2−w5wR,4−w3wR,6−w1wR,8)R≈0(w7wR,6+w5wR,8)R≈1  [Math. 17]
A signal bi-1 is reconstructed after the overlapping and adding.