CELP is a very popular technology which is used to encode a speech signal by using specific human voice characteristics or a human vocal voice production model. When CELP is used in a core layer of a scalable codec, it is quite possible that CELP will also be used to code music signal. Examples of CELP implementations with scalable transform coding can be found in the ITU-T G.729.1 or G.718 standards, the related contents of which are summarized hereinbelow. A very detailed description can be found in the ITU-T standard documents.
General Description of ITU-T G.729.1
ITU-T G.729.1 is also called a G.729EV coder which is an 8-32 kbit/s scalable wideband (50-7,000 Hz) extension of ITU-T Rec. G.729. By default, the encoder input and decoder output are sampled at 16,000 Hz. The bitstream produced by the encoder is scalable and has 12 embedded layers, which will be referred to as Layers 1 to 12. Layer 1 is the core layer corresponding to a bit rate of 8 kbit/s. This layer is compliant with the G.729 bitstream, which makes G.729EV interoperable with G.729. Layer 2 is a narrowband enhancement layer adding 4 kbit/s, while Layers 3 to 12 are wideband enhancement layers adding 20 kbit/s with steps of 2 kbit/s.
This coder is designed to operate with a digital signal sampled at 16,000 Hz followed by conversion to 16-bit linear PCM for the input to the encoder. However, the 8,000 Hz input sampling frequency is also supported. Similarly, the format of the decoder output is 16-bit linear PCM with a sampling frequency of 8,000 or 16,000 Hz. Other input/output characteristics are converted to 16-bit linear PCM with 8,000 or 16,000 Hz sampling before encoding, or from 16-bit linear PCM to the appropriate format after decoding.
The G.729EV coder is built upon a three-stage structure: embedded Code-Excited Linear-Prediction (CELP) coding, Time-Domain Bandwidth Extension (TDBWE) and predictive transform coding that will be referred to as Time-Domain Aliasing Cancellation (TDAC). The embedded CELP stage generates Layers 1 and 2 which yield a narrowband synthesis (50-4,000 Hz) at 8 kbit/s and 12 kbit/s. The TDBWE stage generates Layer 3 and allows producing a wideband output (50-7000 Hz) at 14 kbit/s. The TDAC stage operates in the Modified Discrete Cosine Transform (MDCT) domain and generates Layers 4 to 12 to improve quality from 14 to 32 kbit/s. TDAC coding represents jointly the weighted CELP coding error signal in the 50-4,000 Hz band and the input signal in the 4,000-7,000 Hz band.
The G.729EV coder operates on 20 ms frames. However, the embedded CELP coding stage operates on 10 ms frames, like G.729. As a result, two 10 ms CELP frames are processed per 20 ms frame. In the following, to be consistent with the text of ITU-T Rec. G.729, the 20 ms frames used by G.729EV will be referred to as superframes, whereas the 10 ms frames and the 5 ms subframes involved in the CELP processing will be respectively called frames and subframes.
G729.1 Encoder
A functional diagram of the G729.1 encoder part is presented in FIG. 1. The encoder operates on 20 ms input superframes. By default, input signal 101, sWB(n), is sampled at 16,000 Hz., therefore, the input superframes are 320 samples long. Input signal sWB(n) is first split into two sub-bands using a quadrature mirror filterbank (QMF) defined by the filters H1(z) and H2(z). Lower-band input signal 102, sLBqmf(n), obtained after decimation is pre-processed by a high-pass filter Hh1(z) with 50 Hz cut-off frequency. The resulting signal 103, sLB(n), is coded by the 8-12 kbit/s narrowband embedded CELP encoder. To be consistent with ITU-T Rec. G.729, the signal sLB(n) will also be denoted s(n). The difference 104, dLB(n), between s(n) and the local synthesis 105, ŝenh(n), of the CELP encoder at 12 kbit/s is processed by the perceptual weighting filter WLB(z). The parameters of WLB(z) are derived from the quantized LP coefficients of the CELP encoder. Furthermore, the filter WLB(z) includes a gain compensation that guarantees the spectral continuity between the output 106, dLBw(n), of WLB(z) and the higher-band input signal 107, sHB(n). The weighted difference dLBw(n) is then transformed into frequency domain by MDCT. The higher-band input signal 108, sHBfold(n), obtained after decimation and spectral folding by (−1)n is pre-processed by a low-pass filter Hh2(z) with a 3,000 Hz cut-off frequency. Resulting signal sHB(n) is coded by the TDBWE encoder. The signal sHB(n) is also transformed into the frequency domain by MDCT. The two sets of MDCT coefficients, 109, DLBw(k), and 110, SHB(k), are finally coded by the TDAC encoder. In addition, some parameters are transmitted by the frame erasure concealment (FEC) encoder in order to introduce parameter-level redundancy in the bitstream. This redundancy allows improved quality in the presence of erased superframes.
G729.1 Decoder
A functional diagram of the G729.1 decoder is presented in FIG. 2a, however, the specific case of frame erasure concealment is not considered in this figure. The decoding depends on the actual number of received layers or equivalently on the received bit rate. If the received bit rate is:                8 kbit/s (Layer 1): The core layer is decoded by the embedded CELP decoder to obtain 201, ŝLB(n)=ŝ(n). Then, ŝLB(n) is postfiltered into 202, ŝLBpost(n) and post-processed by a high-pass filter (HPF) into 203, ŝLBqmf(n)=ŝLBhpf(n). The QMF synthesis filterbank defined by the filters G1(z) and G2(z) generates the output with a high-frequency synthesis 204, ŝHBqmf(n), set to zero.        12 kbit/s (Layers 1 and 2): The core layer and narrowband enhancement layer are decoded by the embedded CELP decoder to obtain 201, ŝLB(n)=ŝenh(n), and ŝLB(n) is then postfiltered into 202, ŝLBpost(n) and high-pass filtered to obtain 203, ŝLBqmf(n)=ŝLBhpf(n). The QMF synthesis filterbank generates the output with a high-frequency synthesis 204, ŝHBqmf(n) set to zero.        14 kbit/s (Layers 1 to 3): In addition to the narrowband CELP decoding and lower-band adaptive postfiltering, the TDBWE decoder produces a high-frequency synthesis 205, ŝHBbwe(n) which is then transformed into frequency domain by MDCT so as to zero the frequency band above 3000 Hz in the higher-band spectrum 206, ŜHBbwe(k). The resulting spectrum 207, ŜHB(k) is transformed in time domain by inverse MDCT and overlap-add before spectral folding by (−1)n. In the QMF synthesis filterbank the reconstructed higher band signal 204, ŝHBqmf(n) is combined with the respective lower band signal 202, ŝLBqmf(n)=ŝLBpost(n). reconstructed at 12 kbit/s without high-pass filtering.        Above 14 kbit/s (Layers 1 to 4+): In addition to the narrowband CELP and TDBWE decoding, the TDAC decoder reconstructs MDCT coefficients 208, {circumflex over (D)}LBw(k) and 207, ŜHB(k), which correspond to the reconstructed weighted difference in lower band (0-4,000 Hz) and the reconstructed signal in higher band (4,000-7,000 Hz). Note that in the higher band, the non-received sub-bands and the sub-bands with zero bit allocation in TDAC decoding are replaced by the level-adjusted sub-bands of ŜHBbwe(k). Both {circumflex over (D)}LBw(k) and ŜHB(k) are transformed into the time domain by inverse MDCT and overlap-add. Lower-band signal 209, {circumflex over (d)}LBw(n) is then processed by the inverse perceptual weighting filter WLB(z)−1. To attenuate transform coding artefacts, pre/post-echoes are detected and reduced in both the lower- and higher-band signals 210, a {circumflex over (d)}LB(n) and 211, ŝHB(n). The lower-band synthesis ŝLB(n) is postfiltered, while the higher-band synthesis 212, ŝHBfold(n), is spectrally folded by (−1)n. The signals ŝLB(n)=ŝLBpost(n) and ŝHBqmf(n) are then combined and upsampled in the QMF synthesis filterbank.Coder Modes        
The G.729.1 coder, also known as the G.729EV coder is based on a split-band coding approach that naturally yields a very flexible architecture. This coder can easily deal with input and output signals sampled not only at 16,000 Hz, but also at 8,000 Hz by taking advantage of QMF analysis and synthesis filterbanks Table 1 lists the available modes in G.729EV. The DEFAULT mode of G.729EV corresponds to the default operation mode of G.729EV, in which case input and output signals are sampled at 16,000 Hz.
TABLE 1G.729.1 Encoder/Decoder ModesModeEncoder OperationDecoder OperationDEFAULT16,000 Hz input16,000 Hz OutputNB_INPUT8.000 Hz inputN/AG729_BSTbit rate limited to 8N/Akbit/s, output G.729bitstreamNB_OUTPUTN/A8,000 Hz outputG729B_BSTN/Aread and decode G729BbitstreamLOW_DELAYN/Abit rate limited to 8-12kbit/s, low delay.
Two additional encoder modes are provided:                The NB INPUT mode specifies that the encoder input is sampled at 8,000 Hz, which allows the bypassing of the QMF analysis filterbank; and        In G729 BST mode, the encoder runs at 8 kbit/s and generates a bitstream with G.729 format using 10 ms frames. The encoder input is sampled at 16,000 Hz by default. If the NB INPUT mode is also set, this input is sampled at 8,000 Hz.        
On the other hand, three decoder modes are also available:                The NB_OUTPUT mode specifies that the decoder output is sampled at 8,000 Hz, which allows the bypassing of the QMF synthesis filterbank;        In G729B_BST mode the decoder reads and decodes G729B frames; and        The LOW_DELAY mode is provided for narrowband use cases. In this case, the decoder bit rate is limited to 8-12 kbit/s, which allows the reduction of the overall algorithmic delay by skipping the inverse MDCT and overlap-add.        
In G729B_BST or LOW_DELAY modes, the decoder output is sampled at 16,000 Hz by default. If the NB_OUTPUT mode is also set, the decoder output is sampled at 8,000 Hz. Note that the LOW_DELAY decoder mode has not been formally tested in the presence of frame erasures.
Bit Allocation to Coder Parameters and Bitstream Layer Format
The bit allocation of the coder is presented in Table 2. This table is structured according to the different layers. For a given bit rate, the bitstream is obtained by concatenating the contributing layers. For example, at 24 kbit/s, which corresponds to 480 bits per superframe, the bitstream comprises Layer 1 (160 bits)+Layer 2 (80 bits)+Layer 3 (40 bits)+Layers 4 to 8 (200 bits).
The G.729EV bitstream format is illustrated in FIG. 2b. Since the TDAC coder employs spectral envelope entropy coding and adaptive sub-band bit allocation, the TDAC parameters are encoded with a variable number of bits. However, the bitstream above 14 kbit/s can be still formatted into layers of 2 kbit/s, because the TDAC encoder always performs a bit allocation on the basis of the maximum encoder bitrate (32 kbit/s), and the TDAC decoder can handle bitstream truncations at arbitrary positions.
TABLE 2G.729 Bit Allocation (per 20 ms superframe)Total PerParameterCodewordNumber of BitsSuper-frameLayer 1 - Core layer (narrowband embedded CELP)10 ms frame 110 ms frame 2Line spectrum pairsL0, L1, L2,181836L3subframe 1subframe 2subframe 1subframe 2Adaptive-codebookP1, P2858526delayPitch-delay parityP0112Fixed-codebookC1, C21313131352indexFixed-codebookS1, S2444416signCodebook gainsGA1, GA2333312(stage 1)Codebook gainsGB1, GB2444416(stage 2)8 kbit/s core total160Layer 2 - Narrowband Enhancement Layer (embedded CELP)2nd Fixed-C′1, C′21313131352codebook index2nd Fixed-S′1, S′2444416codebook sign2nd Fixed-G′1, G′2323210codebook gainFEC bits (classCL1, CL2112information)12 kbit/s layer80totalLayer 3 - Wideband Enhancement Layer (TDBWE)Time envelopeMU55meanTime envelope VQT1, T27 + 714Frequency envelopeF1, F2, F35 + 5 + 414split VQFEC bits (classPH77information)14 kbit/s layer40totalLayesr 4-12 - Wideband Enhancement Layers (TDAC)FEC bitsE55(energyinformation)MDCT normN44HB spectralRMS2variable number nbits_HBnbits_HBenvelopeLB spectralRMS1variable number nbits_LBnbits_LBenvelopefine structureVQ1 tonbits_VQ = 351 − nbits_HB − nbits_LBnbits_VQ(VQ of sub-VQ18bandscoefficients)16-32 kbit/s360layer totalTOTAL640Post-Filtering of the Lower Band
As described in 4.2/G.729, the G.729 decoder includes a post-processing split into adaptive postfiltering, high-pass filtering and signal upscaling. Similarly, the G.729EV decoder includes lower-band post-processing. However, this procedure is limited to adaptive postfiltering and high-pass filtering. In the G.729EV decoder, signal upscaling is handled by the QMF synthesis filterbank. The adaptive postfilter in G.729EV is directly derived from the G.729 postfilter. It is also a cascade of three filters: a long-term postfilter Hp (z), a short-term postfilter Hf (z) and a tilt compensation filter Ht (z), followed by an adaptive gain control procedure.
The postfilter coefficients are updated every 5 ms subframe. The postfiltering process is organized as follows. First, the reconstructed speech ŝ(n) is inverse filtered through Â(z/γn) to produce the residual signal {circumflex over (r)}(n)). This signal is used to compute the delay T and gain gt of the long-term postfilter Hp(z). The signal {circumflex over (r)}(n) is then filtered through the long-term postfilter Hp(z) and the synthesis filter 1/[gfÂ(z/γd)]. Finally, the output signal of the synthesis filter 1/[gfÂ(z/γd)] is passed through the tilt compensation filter Ht(z) to generate the postfiltered reconstructed speech signal sf(n). Adaptive gain control is then applied to sf(n) to match the energy of ŝ(n). The resulting signal sf′(n) is high-pass filtered and scaled to produce the output signal of the decoder. In the G.729EV decoder, the signal upscaling is handled by the QMF synthesis filterbank.
The long-term postfilter is given by:
                                          H            p                    ⁡                      (            z            )                          =                              1                          1              +                                                γ                  p                                ⁢                                  g                  l                                                              ⁢                      (                          1              +                                                γ                  p                                ⁢                                  g                  l                                ⁢                                  z                                      -                    T                                                                        )                                              (        1        )            where T is the pitch delay, the integer pitch range of T defined in G7.729 is from PIT_MIN=20 to PIT_MAX=143, and gl is the gain coefficient. Note that gl is bounded by 1 and is set to zero if the long-term prediction gain is less than 3 dB. The factor γp controls the amount of long-term postfiltering and has the value of γp=0.5. The long-term delay and gain are computed from the residual signal {circumflex over (r)}(n) obtained by filtering the speech ŝ(n) through Â(z/γn), which is the numerator of the short-term postfilter:
                                          r            ^                    ⁡                      (            n            )                          =                                            s              ^                        ⁡                          (              n              )                                +                                    ∑                              i                =                1                            10                        ⁢                                          γ                n                i                            ⁢                                                a                  ^                                i                            ⁢                                                s                  ^                                ⁡                                  (                                      n                    -                    i                                    )                                                                                        (        2        )            
The long-term delay is computed using a two-pass procedure. The first pass selects the best integer T0 in the range [int(T1)−1, int(T1)+1], where int(T1) is the integer part of the (transmitted) pitch delay T1 in the first subframe. The best integer delay is the one that maximizes the correlation:
                              R          ⁡                      (            k            )                          =                              ∑                          n              =              0                        39                    ⁢                                                    r                ^                            ⁡                              (                n                )                                      ⁢                                          r                ^                            ⁡                              (                                  n                  -                  k                                )                                                                        (        3        )            
The second pass chooses the best fractional delay T with resolution ⅛ around T0. This is done by finding the delay with the highest pseudo-normalized correlation:
                                          R            ′                    ⁡                      (            k            )                          =                                            ∑                              n                =                0                            39                        ⁢                                                            r                  ^                                ⁡                                  (                  n                  )                                            ⁢                                                                    r                    ^                                    k                                ⁡                                  (                  n                  )                                                                                                        ∑                                  n                  =                  0                                39                            ⁢                                                                                          r                      ^                                        k                                    ⁡                                      (                    n                    )                                                  ⁢                                                                            r                      ^                                        k                                    ⁡                                      (                    n                    )                                                                                                          (        4        )            where {circumflex over (r)}k(n) is the residual signal at delay k. Once the optimal delay T is found, the corresponding correlation R′(T) is normalized with the square-root of the energy of {circumflex over (r)}(n). The squared value of this normalized correlation is used to determine if the long-term postfilter should be disabled. This is done by setting gl=0 if:
                                                                                          R                  ′                                ⁡                                  (                  T                  )                                            2                                                      ∑                                  n                  =                  0                                39                            ⁢                                                                    r                    ^                                    ⁡                                      (                    n                    )                                                  ⁢                                                      r                    ^                                    ⁡                                      (                    n                    )                                                                                <          0.5                ,                            (        5        )            Otherwise the value of gl is computed from:
                              g          l                =                                                                              ∑                                      n                    =                    0                                    39                                ⁢                                                                            r                      ^                                        ⁡                                          (                      n                      )                                                        ⁢                                                                                    r                        ^                                            k                                        ⁡                                          (                      n                      )                                                                                                                    ∑                                      n                    =                    0                                    39                                ⁢                                                                                                    r                        ^                                            k                                        ⁡                                          (                      n                      )                                                        ⁢                                                                                    r                        ^                                            k                                        ⁡                                          (                      n                      )                                                                                            ⁢                                                  ⁢            bounded            ⁢                                                  ⁢            by            ⁢                                                  ⁢            0                    ≤          gl          ≤                      1.0            .                                              (        6        )            
The non-integer delayed signal {circumflex over (r)}k(n) is first computed using an interpolation filter of length 33. After the selection of T, {circumflex over (r)}k(n) is recomputed with a longer interpolation filter of length 129. The new signal replaces the previous signal only if the longer filter increases the value of R′(T).
The short-term postfilter is given by:
                                                        H              f                        ⁡                          (              z              )                                =                                                    1                                  g                  f                                            ⁢                                                                    A                    ^                                    ⁡                                      (                                          z                      /                                              γ                        n                                                              )                                                                                        A                    ^                                    ⁡                                      (                                          z                      /                                              γ                        d                                                              )                                                                        =                                          1                                  g                  f                                            ⁢                                                1                  +                                                            ∑                                              i                        =                        1                                            10                                        ⁢                                                                  γ                        n                        i                                            ⁢                                                                        a                          ^                                                i                                            ⁢                                              z                                                  -                          i                                                                                                                                      1                  +                                                            ∑                                              i                        =                        1                                            10                                        ⁢                                                                  γ                        d                        i                                            ⁢                                                                        a                          ^                                                i                                            ⁢                                              z                                                  -                          i                                                                                                                                                        ,                            (        7        )            where Â(z) is the received quantized LP inverse filter (LP analysis is not done at the decoder) and the factors γn and γd control the amount of short-term postfiltering, and are set to γn=0.55, and γd=0.7. The gain term gf is calculated on the truncated impulse response hf(n) of the filter Â(z/γn)/Â(z/γd) and is given by:
                              g          f                =                              ∑                          n              =              0                        19                    ⁢                                                                                    h                  f                                ⁡                                  (                  n                  )                                                                    .                                              (        8        )            
The filter Ht(z) compensates for the tilt in the short-term postfilter Hf(z) and is given by:
                                                        H              t                        ⁡                          (              z              )                                =                                    1                              g                t                                      ⁢                          (                              1                +                                                      γ                    t                                    ⁢                                      k                    1                    ′                                    ⁢                                      z                                          -                      1                                                                                  )                                      ,                            (        9        )            where γtk1′ is a tilt factor k1′ being the first reflection coefficient calculated from hf(n) with:
                              k          1          ′                =                                                                              r                  h                                ⁡                                  (                  1                  )                                                                              r                  h                                ⁡                                  (                  0                  )                                                      ⁢                                                  ⁢                                          r                h                            ⁡                              (                i                )                                              =                                    ∑                              j                =                0                                            19                -                i                                      ⁢                                                            h                  f                                ⁡                                  (                  j                  )                                            ⁢                                                h                  f                                ⁡                                  (                                      j                    +                    1                                    )                                                                                        (        10        )            
The gain term gt=1−|γtk1′| compensates for the decreasing effect of gf in Hf(z). Furthermore, it has been shown that the product filter Hf(z)Ht(z) has generally no gain. Two values for γt are used depending on the sign of k1′. If k1′ is negative, γt=0.9, and if k1′ is positive, γt=0.2.
Adaptive gain control is used to compensate for gain differences between the reconstructed speech signal ŝ(n) and the postfiltered signal sf(n). The gain scaling factor G for the present subframe is computed by:
                    G        =                                                            ∑                                  n                  =                  0                                39                            ⁢                                                                                    s                    ^                                    ⁡                                      (                    n                    )                                                                                                                      ∑                                  n                  =                  0                                39                            ⁢                                                                sf                  ⁡                                      (                    n                    )                                                                                                .                                    (        11        )            
The gain-scaled postfiltered signal sf′(n) is given by:sf′(n)=g(n)sf(n) n=0, . . . , 39  (12)where g(n) is updated on a sample-by-sample basis and given by:g(n)=0.85g(n-1)+0.15G n=0, . . . , 39.  (13)
The initial value of g(−1)=1.0 is used. Then for each new subframe, g(−1) is set equal to g(39) of the previous subframe.
A high-pass filter with a cut-off frequency of 100 Hz is applied to the reconstructed postfiltered speech sf′(n). The filter is given by:
                                          H                          h              ⁢                                                          ⁢              2                                ⁡                      (            z            )                          =                                            0.93980581              -                              1.8795834                ⁢                                                                  ⁢                                  z                                      -                    1                                                              +                              0.93980581                ⁢                                                                  ⁢                                  z                                      -                    2                                                                                      1              -                              1.9330735                ⁢                                                                  ⁢                                  z                                      -                    1                                                              +                              0.93589199                ⁢                                                                  ⁢                                  z                                      -                    2                                                                                .                                    (        14        )            The filtered signal is multiplied by a factor 2 to restore the input signal level.G.729 postprocessing is described above. Modifications in G.729.1 corresponding to the G.729 adaptive postfilter are:                The parameters γp, γn, γd of G.729 long-term and short-term postfilters depend on the decoder bit rate (8 or 12 kbit/s, or above);        The G.729 adaptive gain control is modified to attenuate the quantization errors in silence segments (only at 8 and 12 kbit/s).        
The values of γp, γn and γd of the long-term and short-term postfilters are given in Table 3. At 12 kbit/s, the values of γn and γd depend on a factor 0≦Th≦1, which is based on the 10 ms frame energy and smoothed by a 5-tap median filter.
TABLE 3G.729.1 Parameters of the AdaptivePostfilter Depending on Bit RateBit rate(kbit/s)γpγnγd 80.50.5512Th × 0.7 +Th × 0.75 +(1 − Th) × 0.55(1 − Th) × 0.714 and above0.70.75Post-Processing of the Decoded Higher Band
The post-processing of MDCT coefficients is only applied to the higher band because the lower band is post-processed with a conventional time-domain approach. For the high-band, there are no LPC coefficients transmitted to the decoder. The TDAC post-processing is performed on the available MDCT coefficients at the decoder side. There are 160 higher-band MDCT coefficients that are noted as Ŷ(k), k=160, . . . , 319. For this specific post-processing, the higher band is divided into 10 sub-bands of 16 MDCT coefficients. The average magnitude in each sub-band is defined as the envelope:
                                          env            ⁡                          (              j              )                                =                                    ∑                              k                =                0                            15                        ⁢                                                                          Y                  ^                                ⁡                                  (                                      160                    +                                          16                      ⁢                                                                                          ⁢                      j                                        +                    k                                    )                                                                                  ,                  j          =          0                ,        1        ,        …        ⁢                                  ,        9.                            (        15        )            
The post-processing consists of two steps. The first step is an envelope post-processing (corresponding to short-term post-processing), which modifies the envelope. The second step is a fine structure post-processing (corresponding to long-term post-processing), which enhances the magnitude of each coefficient within each sub-band. The basic concept is to make the lower magnitudes relatively further lower, where the coding error is relatively bigger than the higher magnitudes. The algorithm to modify the envelope is described as follows. The maximum envelope value is:
                              env          max                =                              max                                          j                =                0                            ,              …              ⁢                                                          ,              9                                ⁢                                    env              ⁡                              (                j                )                                      .                                              (        16        )            
Gain factors, which will be applied to the envelope, are calculated with the equation:
                                                        fac              1                        ⁡                          (              j              )                                =                                                    α                ENV                            ⁢                                                env                  ⁡                                      (                    j                    )                                                                    env                  max                                                      +                          (                              1                -                                  α                  ENV                                            )                                      ,                  j          =          0                ,        …        ⁢                                  ,        9        ,                            (        17        )            where αENV (0<αENV<1) depends on the bit rate. The higher the bit rate, the smaller the constant αENV. After determining the factors fac1(j), the modified envelope is expressed as:env′(j)=gnormfac1(j)env(j), j=0, . . . , 9,  (18)where gnorm is a gain to maintain the overall energy:
                              g          norm                =                                                            ∑                                  k                  =                  0                                9                            ⁢                              env                ⁡                                  (                  j                  )                                                                                    ∑                                  k                  =                  0                                9                            ⁢                                                                    fac                    1                                    ⁡                                      (                    j                    )                                                  ⁢                                  env                  ⁡                                      (                    j                    )                                                                                .                                    (        19        )            
The fine structure modification within each sub-band will be similar to the above envelope post-processing. Gain factors for the magnitudes are calculated as:
                                                        fac              2                        ⁡                          (                              j                ,                k                            )                                =                                                    β                ENV                            ⁢                                                                                                            Y                      ^                                        ⁡                                          (                                              160                        +                                                  16                          ⁢                                                                                                          ⁢                          j                                                +                        k                                            )                                                                                                                              Y                    max                                    ⁡                                      (                    j                    )                                                                        +                          (                              1                -                                  β                  ENV                                            )                                      ,                  k          =          0                ,        …        ⁢                                  ,        15        ,                            (        20        )            where the maximum magnitude Ymax(j) within a sub-band is:
                                                        Y              max                        ⁡                          (              j              )                                =                                    max                                                k                  =                  0                                ,                …                ,                15                                      ⁢                                                                          Y                  ^                                ⁡                                  (                                      160                    +                                          16                      ⁢                                                                                          ⁢                      j                                        +                    k                                    )                                                                                  ,                            (        21        )            and βENV (0<βENV<1) depends on the bit rate. Generally, the higher the bit rate, the smaller βENV. By combining both the envelope post-processing and the fine structure post-processing, the final post-processed higher-band MDCT coefficients are:Ŷpost(160+16j+k)=gnormfac1(j)fac2(j,k){circumflex over (Y)}(160+16j+k), j=0, . . . , 9 k=0, . . . , 15  (22)