Compression of digital speech and audio signals is well known. Compression is generally required to efficiently transmit signals over a communications channel, or to store compressed signals on a digital media device, such as a solid-state memory device or computer hard disk. Although there exist many compression (or “coding”) techniques, one method that has remained very popular for digital speech coding is known as Code Excited Linear Prediction (CELP), which is one of a family of “analysis-by-synthesis” coding algorithms. Analysis-by-synthesis generally refers to a coding process by which parameters of a digital model are used to synthesize a set of candidate signals that are compared to an input signal and analyzed for distortion. The set of parameters that yield the lowest distortion, or error component, is then either transmitted or stored. The set of parameters are eventually used to reconstruct an estimate of the original input signal. CELP is a particular analysis-by-synthesis method that uses one or more excitation codebooks that essentially comprise sets of code-vectors that are retrieved from the codebook in response to a codebook index. These code-vectors are used as stimuli to the speech synthesizer in a “trial and error” process in which an error criterion is evaluated for each of the candidate code-vectors, and the candidates resulting in the lowest error are selected.
For example, FIG. 1 is a block diagram of prior-art CELP encoder 100. In CELP encoder 100, an input signal comprising speech sample n (s(n)) is applied to a Linear Predictive Coding (LPC) analysis block 101, where linear predictive coding is used to estimate a short-term spectral envelope. The resulting spectral parameters (or LP parameters) are denoted by the transfer function A(z). The spectral parameters are applied to LPC Quantization block 102 that quantizes the spectral parameters to produce quantized spectral parameters Aq that are suitable for use in a multiplexer 108. The quantized spectral parameters Aq are then conveyed to multiplexer 108, and the multiplexer produces a coded bit stream based on the quantized spectral parameters and a set of parameters, τ, β, k, and γ, that are determined by a squared error minimization/parameter quantization block 107. As one of ordinary skill in the art will recognize, τ, β, k, and γ are defined as the closed loop pitch delay, adaptive codebook gain, fixed codebook vector index, and fixed codebook gain, respectively.
The quantized spectral, or LP, parameters are also conveyed locally to LPC synthesis filter 105 that has a corresponding transfer function 1/Aq(z). LPC synthesis filter 105 also receives combined excitation signal u(n) from first combiner 110 and produces an estimate of the input signal ŝ(n) based on the quantized spectral parameters Aq and the combined excitation signal u(n). Combined excitation signal u(n) is produced as follows. An adaptive codebook code-vector C96  is selected from adaptive codebook (ACB) 103 based on the index parameter τ. The adaptive codebook code-vector cτ is then weighted based on the gain parameter β and the weighted adaptive codebook code-vector is conveyed to first combiner 110. A fixed codebook code-vector ck is selected from fixed codebook (FCB) 104 based on the index parameter k. The fixed codebook code-vector ck is then weighted based on the gain parameter γ and is also conveyed to first combiner 110. First combiner 110 then produces combined excitation signal u(n) by combining the weighted version of adaptive codebook code-vector cτ with the weighted version of fixed codebook code-vector ck. (For the convenience of the reader, the variables are also given in terms of their z-transforms. The z-transform of a variable is represented by a corresponding capital letter, for example z-transform of e(n) is represented as E(z)).
LPC synthesis filter 105 conveys the input signal estimate ŝ(n) to second combiner 112. Second combiner 112 also receives input signal s(n) and subtracts the estimate of the input signal ŝ(n) from the input signal s(n). The difference between input signal s(n) and input signal estimate ŝ(n) is applied to a perceptual error weighting filter 106, which produces a perceptually weighted error signal e(n) based on the difference between ŝ(n) and s(n) and a weighting function w(n), such thatE(z)=W(z)(S(z)−ŝ(z))  (1)
Perceptually weighted error signal e(n) is then conveyed to squared error minimization/parameter quantization block 107. Squared error minimization/parameter quantization block 107 uses the error signal e(n) to determine an optimal set of parameters τ, β, k, and γ that produce the best estimate ŝ(n) of the input signal s(n).
FIG. 2 is a block diagram of prior-art decoder 200 that receives transmissions from encoder 100. As one of ordinary skilled in the art realizes, the coded bit stream produced by encoder 100 is used by a de-multiplexer in decoder 200 to decode the optimal set of parameters, that is, τ, β, k, and γ, in a process that is identical to the synthesis process performed by encoder 100. Thus, if the coded bit stream produced by encoder 100 is received by decoder 200 without errors, the speech ŝ(n) output by decoder 200 can be reconstructed as an exact duplicate of the input speech estimate ŝ(n) produced by encoder 100.
Returning to FIG. 1, weighting filter W(z) utilizes the frequency masking property of the human ear, such that simultaneously occurring noise is masked by the stronger signal provided the frequencies of the signal and the noise are close. As described in Salami R., Laflamme C., Adoul J-P, Massaloux D., “A toll quality 8 Kb/s speech coder for personal communications system,” IEEE Trans. On Vehicular Technology, pp. 808–816, August 1994 W(z) is derived from the LPC coefficients αi, and is given by                                           W            ⁡                          (              z              )                                =                                                                      A                  ⁡                                      (                                          z                      /                                              γ                        1                                                              )                                                                    A                  ⁡                                      (                                          z                      /                                              γ                        2                                                              )                                                              ⁢                                                          ⁢              0                        <                          γ              2                        <                          γ              1                        ≤            1                          ,                                  ⁢        where                            (        2        )                                                      a            ⁡                          (              Z              )                                =                      1            +                                          ∑                                  i                  =                  1                                P                            ⁢                                                          ⁢                                                a                  i                                ⁢                                  z                                      -                    i                                                                                      ,                            (        3        )            and p is the order of the LPC. Since the weighting filter is derived from LPC spectrum, it is also referred to as “spectral weighting”.
The above-described procedure does not take into account the fact that the signal periodicity also contributes to the spectral peaks at the fundamental frequencies and at the multiples of the fundamental frequencies. Various techniques have been proposed to utilize noise masking of these fundamental frequency harmonics. For example, in “Digital speech coder and method utilizing harmonic noise weighting” U.S. Pat. No. 5,528,723: Gerson and Jasiuk, and in Gerson I. A., Jasiuk M. A., “Techniques for improving the performance of CELP type speech coders,” Proc. IEEE ICASSP, pp. 205–208, 1993, a method was proposed which includes harmonic noise masking in the weighting filter. As the above-references show, harmonic noise weighting is incorporated by modifying the spectral weighting filter by a harmonic noise weighting filter C(z) and is given by:                                           C            ⁡                          (              z              )                                =                      1            -                                          ɛ                p                            ⁢                                                ∑                                      i                    =                                          -                                              M                        1                                                                                                  M                    2                                                  ⁢                                                                  ⁢                                                      b                    i                                    ⁢                                      z                                          -                                              (                                                  D                          +                          i                                                )                                                                                                                                ,                            (        4        )            where D corresponds to the pitch period or the pitch lag or delay, bi are the filter coefficients and 0≦εp<1 is the harmonic noise weighting coefficient. The weighting filter incorporating harmonic noise weighting is given by:WH(z)=W(z)C(z).  (5).
The amount of harmonic noise weighting is typically dependent on the product εpbi. Since bi is dependent on the delay, the amount of harmonic noise weighting is a function of the delay. Prior-art references noted above have suggested that different values of harmonic noise weighting coefficient (εp) can be used at different predetermined times: i.e., εp may be a time varying parameter (for example be allowed to change from sub-frame to sub-frame), however, the prior art does not provide a method for choosing p. Therefore, a need exists for a method and apparatus for performing harmonic noise weighting in digital speech coders that optimally and dynamically determines appropriate values of εp so that the amount of harmonic noise weighting can be optimized. While prior-art references noted above have suggested that different values of the harmonic noise weighting coefficient (εp) can be used at different times (e.g., εp may vary from sub-frame to sub-frame), the prior art does not provide a method for varying εp or suggest when or how such a method may be beneficial. Therefore, a need exists for a method and apparatus for performing harmonic noise weighting in digital speech coders that optimally and dynamically determines appropriate values of εp so that the overall perceptual weighting can be improved.