In the speech recognition field, methods for correcting a fluctuation in the resonant frequency of the spectrum, caused by a difference in the speaker's vocal tract length, are proposed for increasing the accuracy of recognition performance. Such a technology is called vocal tract normalization. An example of the vocal normalization method is disclosed, for example, in Patent Document 1 (Japanese patent No. 3632529) in which the degree of difference is efficiently estimated by modeling a difference in the resonant frequency using linear transformation in the cepstrum space.
The vocal tract normalization proposed in Patent Document 1 comprises an analysis unit that analyzes a voice and outputs a cepstrum, a warp estimation unit that estimates the value of the warping factor, which indicates a warping degree, from the cepstrum, and a transformation unit that linearly transforms the cepstrum using the value of the warping factor.
The linear transformation used in the transformation unit represents the transformation on the frequency axis as the linear transformation of the cepstrum by using the inverse transformation of a all-pass filter. This transformation uses one parameter.
In Patent Document 1, the HMM (Hidden Markov Model) used for recognition is used to estimate a warping factor. As described in Patent Document 1, the HMM used for recognition is a model in which the output probability of phonological information is modeled on a word or phoneme basis.
For example, “hai” is divided into phonemes “h a i” and the occurrence probability is modeled for each of h, a, and i. The occurrence probability in frequently-used modes is a normal distribution. In this case, the average and variance of the feature value, such as a cepstrum, are calculated for each phoneme in advance for use in recognition. In Patent Document 1, the following expression (1) is used for estimating the warping factor.
                    α        =                                            ∑                              j                =                1                            J                        ⁢                                          ∑                                  t                  =                  1                                T                            ⁢                                                                    γ                    t                                    ⁡                                      (                    j                    )                                                  [                                                      ∑                                          m                      =                      1                                        M                                    ⁢                                                                                    (                                                                              c                            mt                                                    -                                                      μ                            jm                                                                          )                                            ⁢                                              {                                                                                                                                                                                                                  (                                                                          m                                      -                                      1                                                                        )                                                                    ⁢                                                                      c                                                                                                                  (                                                                                  m                                          -                                          1                                                                                )                                                                            ⁢                                      t                                                                                                                                      -                                                                                                                                                                                                                                          (                                                                      m                                    +                                    1                                                                    )                                                                ⁢                                                                  c                                                                                                            (                                                                              m                                        +                                        1                                                                            )                                                                        ⁢                                    t                                                                                                                                                                                                      }                                                                                    σ                      mj                      2                                                                      ]                                                                        ∑                              j                =                1                            J                        ⁢                                          ∑                                  t                  =                  1                                T                            ⁢                                                                    γ                    t                                    ⁡                                      (                    j                    )                                                  [                                                      ∑                                          m                      =                      1                                        M                                    ⁢                                                                                    {                                                                                                            (                                                              m                                -                                1                                                            )                                                        ⁢                                                          c                                                                                                (                                                                      m                                    -                                    1                                                                    )                                                                ⁢                                t                                                                                                              -                                                                                    (                                                              m                                +                                1                                                            )                                                        ⁢                                                          c                                                                                                (                                                                      m                                    +                                    1                                                                    )                                                                ⁢                                t                                                                                                                                    }                                            2                                                              σ                      mj                      2                                                                      ]                                                                        (        1        )            
where, J and j indicate numbers of phonemes and states and IDs that identify the phoneme and state, t indicates the time, M and m indicate the number of cepstrum dimensions and the dimension, cmt indicates the mth-dimensional cepstrum coefficient at time t, and μmj and σmj indicate the mth-dimensional average vector value and the mth-dimensional standard deviation value of the phoneme j in the HMM.
The estimation of a warping factor value using this expression requires information on identifying the average vector and the variance.
The ID information for identification may be calculated by giving word information that describes the content of a voice. That is, when “hai” is given in the example above, the phoneme string information such as “h a i” may be identified, the information may be further expanded into the state sequence of each of h, a, and i, and the probability distribution belonging to the state may be identified.
Patent Document 1:
Japanese patent No. 3632529
Non-Patent Document 1:
HTK Book Ver. 3.3, pp. 35-40, pp. 54-64, pp. 127-131