1. Field of the Invention
The present invention relates generally to speech coding. More particularly, the present invention relates to open-loop pitch analysis.
2. Related Art
Speech compression may be used to reduce the number of bits that represent the speech signal thereby reducing the bandwidth needed for transmission. However, speech compression may result in degradation of the quality of decompressed speech. In general, a higher bit rate will result in higher quality, while a lower bit rate will result in lower quality. However, modern speech compression techniques, such as coding techniques, can produce decompressed speech of relatively high quality at relatively low bit rates. In general, modern coding techniques attempt to represent the perceptually important features of the speech signal, without preserving the actual speech waveform. Speech compression systems, commonly called codecs, include an encoder and a decoder and may be used to reduce the bit rate of digital speech signals. Numerous algorithms have been developed for speech codecs that reduce the number of bits required to digitally encode the original speech while attempting to maintain high quality reconstructed speech.
In 1996, the Telecommunication Sector of the International Telecommunication Union (ITU-T) adopted a toll quality speech coding algorithm known as the G.729 Recommendation, entitled “Coding of Speech Signals at 8 kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP),” which is hereby incorporated by reference in its entirety into the present application.
FIG. 1 illustrates the speech signal flow in CS-ACELP (Conjugate Structure Algebraic-Code-Excited-Linear-Prediction) encoder 100 of the G.729 Recommendation, as explained therein. The reference numerals adjacent to each block in FIG. 1 indicate section numbers within the G.729 Recommendation that describe the operation and functionality of each block. As shown, the speech signal or input samples 105 enter the high pass & down scale block (described in Section 3.1 of the G.729 Recommendation), where pre-processing 110 is applied to input samples 105 on a frame-by-frame basis. Next, LP analysis 115 and open-loop pitch search 120 are applied to the pre-processed speech signal on a frame-by-frame basis. Following the open-loop pitch search 120, closed-loop pitch search 125 and algebraic search 130 are applied to the speech signal on a subframe-by-subframe basis, as shown in FIG. 1, which results in generating code index output 135.
As illustrated in FIG. 1, open-loop pitch search 120 includes find open-loop pitch delay 124, which is described at Section 3.4 of the G.729 Recommendation. As explained therein, to reduce the complexity of the search for the best adaptive-codebook delay, the search range is limited around a candidate delay Top, obtained from an open-loop pitch analysis. This open-loop pitch analysis is done once per frame (10 ms). The open-loop pitch estimation uses the weighted speech signal sw(n) from compute weighted speech 122, and is implemented as follows.
In the first step, three maxima of correlation:
            R      ⁡              (        k        )              =                  ∑                  n          =          0                79            ⁢                        sw          ⁡                      (            n            )                          ⁢                  sw          ⁡                      (                          n              -              k                        )                                    where    ,                  ⁢                  sw        ⁡                  (          n          )                    =                        s          ⁡                      (            n            )                          +                              ∑                          i              =              1                                      10              ⁢                                                                            ⁢                                    a              i                        ⁢                          y              1              i                        ⁢                          s              ⁡                              (                                  n                  -                  i                                )                                                    -                              ∑                          i              =              1                        10                    ⁢                                    a              i                        ⁢                          y              2              i                        ⁢                          sw              ⁡                              (                                  n                  -                  i                                )                                                                    n      =      0        ,    …    ⁢                  ,    39  are found in the following three ranges:                i=1:80, . . . , 143        i=2:40, . . . , 79        i=3:20, . . . , 39        
The retained maxima R(ti), i=1, . . . , 3, are normalized through:
                    R        ′            ⁡              (                  t          i                )              =                  R        ⁡                  (                      t            i                    )                                                  ∑            n                    ⁢                                    sw              2                        ⁡                          (                              n                -                                  t                  i                                            )                                                      i      =      1        ,    …    ⁢                  ,    3  
Next, the winner among the three normalized correlations is selected by favoring the delays with the values in the lower range. This is done by weighting the normalized correlations corresponding to the longer delays. The best open-loop delay Top is determined as follows:
  Top = t1  R′(Top) = R′(t1)  if R′(t2) ≧ 0.85R′(Top)    R′(Top) = R′(t2)    Top = t2  end  if R′(t3) ≧ 0.85R′(Top)R′(Top) = R′(t3)    Top = t3  end
The above-described procedure of dividing the delay range into three sections and favoring the smaller values is used to avoid choosing pitch multiples. The smoothed open-loop pitch track can help stabilize the speech perceptual quality. More specifically, smoothed pitch track can make pitch prediction (pitch estimation for lost frames) easier when applying frame erasure concealment algorithm at the decoder side. The above-described conventional algorithm of the G.729 Recommendation, however, does not provide an optimum result and can be further improved. For example, disadvantageously, the conventional algorithm of the G.729 Recommendation only uses the current frame information to smooth the open-loop pitch track in order to avoid pitch multiples.
Accordingly, there is a need in the art to improve conventional open-loop pitch analysis to obtain a smoother open-loop pitch track for stabilizing the speech perceptual quality.