In many types of codecs the input waveform is split into a spectrum envelope and an excitation signal (also called residual), which are coded and transmitted independently. At the decoder the waveform is synthesized from the received envelope and excitation information.
An efficient way to parameterize the spectrum envelope is through linear predictive (LP) coefficients a(j). The process of separation into spectrum envelope and excitation signal e(k) consists of two major steps: 1) estimation of LP coefficients, and 2) filtering the waveform x(k) through an all-zero filter
                              A          ⁡                      (            z            )                          =                  1          -                                    ∑                              j                =                1                            J                        ⁢                                          a                ⁡                                  (                  j                  )                                            ⁢                              z                                  -                  j                                                                                        (        1        )            
to generate an excitation signal e(k), where the model order J is typically set to 10 for input signals sampled at 8 kHz, and to 16 for input signals sampled at 16 kHz. This process is illustrated in FIG. 1.
To minimize transmission load, the audio signal is often lowpass filtered and only the low band (LB) is encoded and transmitted. At the receiver end the high band (HB) may be recovered from the available LB signal characteristics. The process of reconstruction of HB signal characteristics from certain LB signal characteristics is performed by a BWE scheme.
A straightforward reconstruction method is based on spectral folding, where the spectrum of the LB part of the excitation signal is folded (mirrored) around the upper frequency limit of the LB. A problem with such straightforward spectral folding is that the discrete frequency components may not be positioned at integer multiplies of the fundamental frequency of the audio signal. This results in “metallic” sounds and perceptual degradation when reconstructing the HB part of the excitation signal e(k) from the available LB excitation.
One way to avoid this problem is by reconstructing the HB excitation as a white noise sequence, [1-2]. However, replacement of the actual residual (HB excitation) with white noise leads to perceptual degradations, as in certain parts of a speech signal, periodicity continues in the HB.
Reference [3] describes a reconstruction method based on a complex speech production model for generating the HB extension of the excitation signal.