1. Field of the Invention
The present invention relates to an energy feature extraction method for speech recognition, especially to an energy feature extraction method for noisy speech recognition.
2. Description of Related Art
In a practical application of automatic speech recognition, background noise is a considerable interference, because it usually decreases the accuracy of speech recognition. This background noise is a superposing effect of waveforms. In statistical characteristic, background noise is irrelevant to speech signal. Therefore, as disclosed in related research documents, the second order statistic of noisy speech can be expressed by the sum of second order statistic of noise and speech, wherein the energy of signal is a second order statistic. Further, according to the known speech recognition technique, the delta coefficient of speech energy waveform is a pattern recognition feature that is more important than the spectral coefficient.
As known in the prior art, the speech energy waveform is expressed by a summation of the square of sampled waveform in each speech frame as follows:
                                                        E              x                        ⁡                          [              t              ]                                =                                    1              N                        ⁢                                          ∑                                  i                  =                  1                                N                            ⁢                                                          ⁢                                                x                  t                  2                                ⁡                                  [                  i                  ]                                                                    ,                            (        1        )            where N is the number of sampled waveform in the t-th speech frame. The frequently used first and second orders of delta coefficient can be obtained as follows:
                                                        ⅆ                                                E                  x                  l                                ⁡                                  (                  t                  )                                                                    ⅆ              t                                ≅                      Δ            ⁢                                                  ⁢                                          E                x                l                            ⁡                              [                t                ]                                                    =                              1            K                    ⁢                                    ∑                              i                =                                  -                  D                                            D                        ⁢                                                  ⁢                          i              ⁢                                                          ⁢              log              ⁢                                                          ⁢                                                E                  x                                ⁡                                  [                                      t                    +                    i                                    ]                                                                                        (        2        )                                                                                                      ⅆ                  2                                ⁢                                                      E                    x                    l                                    ⁡                                      (                    t                    )                                                                              ⅆ                                  t                  2                                                      ≅                                          Δ                2                            ⁢                                                E                  x                  l                                ⁡                                  [                  t                  ]                                                              =                                    Δ              ⁢                                                          ⁢                                                E                  x                  l                                ⁡                                  [                                      t                    +                    1                                    ]                                                      -                          Δ              ⁢                                                          ⁢                                                E                  x                  l                                ⁡                                  [                                      t                    -                    1                                    ]                                                                    ,                            (        3        )            where Exl(t)=log(Ex(t)),D is the number of speech frames across, and
  K  =            ∑              i        =                  -          D                    D        ⁢                  ⁢                  i        2            .      Generally, a combination of the delta coefficients of the energy waveform and coefficient vectors consisting of other spectral coefficients can increase the speech recognition accuracy. However, in a noisy environment, if background noise and voice signal are irrelevant in statistic, the energy of noisy speech can be expressed as follows:Ey(t)≅Ex(t)+Ew(t)  (4)
If the change of noise energy is slower than that of the speech energy, the noise energy Ew(t) can be defined as a constant ew that does not vary as time goes, and thus Ey(t)≈Ex(t)+ew. According to the differential formula, the first order differentiation of the log energy by time can be expressed as follows:
                                          ⅆ                                          E                x                l                            ⁡                              (                t                )                                                          ⅆ            t                          ≅                              1                                          E                x                            ⁡                              (                t                )                                              ⁢                                                    ⅆ                                                      E                    x                                    ⁡                                      (                    t                    )                                                                              ⅆ                t                                      .                                              (        5        )            Therefore, the first order differentiation of the log of noisy speech by time can be expressed as follows:
                                          ⅆ                                          E                y                l                            ⁡                              (                t                )                                                          ⅆ            t                          ≅                              1                                                            E                  x                                ⁡                                  (                  t                  )                                            +                              e                w                                              ⁢                                                    ⅆ                                                      E                    x                                    ⁡                                      (                    t                    )                                                                              ⅆ                t                                      .                                              (        6        )            Because of the noise energy ew>0, we have:
                                          ⅆ                                          E                y                l                            ⁡                              (                t                )                                                          ⅆ            t                          <                                            ⅆ                                                E                  x                  l                                ⁡                                  (                  t                  )                                                                    ⅆ              t                                .                                    (        7        )            
From the aforementioned description, it is realized that how the additional noise makes differential feature of log energy distort, and this will influence the effect of pattern recognition. Therefore, in a noisy environment, the speech energy waveform will lead to distortion by superposing effect of the aforementioned noise, and further to cause a mistake in speech recognition result.