An HMM is suitable for expressing a time-series signal and is widely used in the fields of pattern recognition which processes a time variant signal as in speech recognition and moving image recognition.
An HMM is generally formed from a plurality of states and expressed as an output probability which outputs a signal in each state and a transition probability between states. FIG. 2 shows an example of an HMM formed from three states. Referring to FIG. 2, s is a state index; a(i,j), a transition probability of transition from the ith state to the jth state; and b(i,o), an output probability at which a signal o is output in the state i. In pattern recognition, recognition target events are modeled by a plurality of HMMs. When a time-series signal o(t) (t=1 to T) is observed, a output probability of each event is obtained from the transition and output probabilities of the corresponding HMMs. An event having the highest probability is obtained as the recognition result.
The details of HMMs and pattern recognition methods using HMMs have been introduced by many references, and a detailed description thereof will be omitted. For example, speech recognition using an HMM is introduced in detail in Lawrence Rabiner and Biing-Hwang Juang, “Fundamentals of Speech Recognition”, Englewood Cliffs N.J.: PTR Prentice Hall (Signal Processing Series), 1993.
Speech recognition often uses a continuous mixture distribution HMM representing the output probability distribution as the sum of a plurality of continuous distributions. An example is shown in FIG. 3. The output probability distribution (301) of HMM states is represented by the sum of distribution 1 (302) and distribution 2 (303). An output probability value b(O) at which a signal O is observed is obtained from b(O)=b′(1,O)+b′(2,O) using the output probability b′(1,O) obtained from distribution 1 (302) and the output probability b′(2,O) obtained from distribution 2 (303).
FIG. 3 exemplifies a one-dimensional observation signal for the descriptive simplicity. In pattern recognition such as speech recognition, a multi-dimensional feature vector is generally used as an observation signal. An output probability distribution is defined as a multidimensional continuous mixture distribution. A Gaussian distribution is often used because of simple calculation. The output probability of the mixture distribution is calculated as the weighted sum of output probabilities of a plurality of Gaussian distributions.
The number of mixture components (number of mixtures) in FIG. 3 is two. In order to build a high performance model, the number of mixture must increase to precisely express the output probability distribution.
The actual output probability of a mixture distribution using, e.g., an diagonal Gaussian distribution is calculated by:
                                          b            ′                    ⁡                      (                          m              ,              𝕆                        )                          =                ⁢                              ∏                          k              =              1                        K                    ⁢                                          ⁢                                                    (                                  1                                      2                    ⁢                                                                  πσ                        2                                            ⁡                                              (                                                  m                          ,                          k                                                )                                                                                            )                                            1                2                                      ⁢                          exp              ⁡                              (                                                      -                                          1                      2                                                        ⁢                                                            ∑                                              k                        =                        1                                            K                                        ⁢                                                                                            (                                                                                    o                              ⁡                                                              (                                k                                )                                                                                      -                                                          μ                              ⁡                                                              (                                                                  m                                  ,                                  k                                                                )                                                                                                              )                                                2                                                                    2                        ⁢                                                                              σ                            2                                                    ⁡                                                      (                                                          m                              ,                              k                                                        )                                                                                                                                              )                                                                        (        1        )                                          b          ⁡                      (            𝕆            )                          =                              ∑                          m              =              1                        M                    ⁢                                    w              ⁡                              (                m                )                                      ·                                          b                ′                            ⁡                              (                                  m                  ,                  𝕆                                )                                                                        (        2        )            where
K: the number of dimensions of feature vector (observation signal) used
O={o(1), o(2), . . . , o(K)}: observation signal (K-dimensional vector)
b(O): the output probability of the mixture distribution
b(m,O): the output probability of the distribution m
M: the number of mixtures
w(m): the weight of the distribution m
σ2(m,k): the k-dimensional variance of the distribution m
μ(m,k): the k-dimensional mean of the distribution m
When an output probability is actually calculated on a computer, the logarithmic value B′(m,O) of the weighted output probability is generally calculated by:
                                                                                                              B                    ′                                    ⁡                                      (                                          m                      ,                      𝕆                                        )                                                  =                                ⁢                                  ln                  ⁡                                      (                                                                  w                        ⁡                                                  (                          m                          )                                                                    ·                                                                        b                          ′                                                ⁡                                                  (                                                      m                            ,                            𝕆                                                    )                                                                                      )                                                                                                                          =                                ⁢                                                      ln                    ⁡                                          (                                              w                        ⁡                                                  (                          m                          )                                                                    )                                                        +                                                            1                      2                                        ⁢                                                                  ∑                                                  k                          =                          1                                                K                                            ⁢                                              ln                        ⁡                                                  (                                                      1                                                          2                              ⁢                                                                                                πσ                                  2                                                                ⁡                                                                  (                                                                      m                                    ,                                    k                                                                    )                                                                                                                                              )                                                                                                      -                                                                                                                        ⁢                                                      1                    2                                    ⁢                                                            ∑                                              k                        =                        1                                            K                                        ⁢                                                                                            (                                                                                    o                              ⁡                                                              (                                k                                )                                                                                      -                                                          μ                              ⁡                                                              (                                                                  m                                  ,                                  k                                                                )                                                                                                              )                                                2                                                                                              σ                          2                                                ⁡                                                  (                                                      m                            ,                            k                                                    )                                                                                                                                                                                            =                                ⁢                                                      C                    ⁡                                          (                      m                      )                                                        -                                                            1                      2                                        ⁢                                                                  ∑                                                  k                          =                          1                                                K                                            ⁢                                                                                                    (                                                                                          o                                ⁡                                                                  (                                  k                                  )                                                                                            -                                                              μ                                ⁡                                                                  (                                                                      m                                    ,                                    k                                                                    )                                                                                                                      )                                                    2                                                                                                      σ                            2                                                    ⁡                                                      (                                                          m                              ,                              k                                                        )                                                                                                                                                                                  ⁢                                  ⁢                              C            ⁡                          (              m              )                                =                                    ln              ⁡                              (                                  w                  ⁡                                      (                    m                    )                                                  )                                      +                                          1                2                            ⁢                                                ∑                                      k                    =                    1                                    K                                ⁢                                  ln                  ⁡                                      (                                          1                                              2                        ⁢                                                                              πσ                            2                                                    ⁡                                                      (                                                          m                              ,                              k                                                        )                                                                                                                )                                                                                                          (        3        )            
The logarithmic value is used to prevent an underflow, and the computation load of the computer can advantageously be reduced because the term of power calculation in equation (1) is expanded. The constant part (C(m)) independent of the observation signal can be calculated in advance. The logarithmic value B(O) of the output probability of the mixture distribution as the final result can be given by:
                                                                        B                ⁡                                  (                  𝕆                  )                                            =                              ln                ⁡                                  (                                      b                    ⁡                                          (                      𝕆                      )                                                        )                                                                                                        =                              ln                ⁡                                  (                                                            ∑                                              m                        =                        1                                            M                                        ⁢                                                                  w                        ⁡                                                  (                          m                          )                                                                    ·                                                                        b                          ′                                                ⁡                                                  (                                                      m                            ,                            𝕆                                                    )                                                                                                      )                                                                                                        =                              ln                ⁡                                  (                                                            ∑                                              m                        =                        1                                            M                                        ⁢                                          exp                      ⁢                                              {                                                  ln                          ⁡                                                      (                                                                                          w                                ⁡                                                                  (                                  m                                  )                                                                                            ·                                                                                                b                                  ′                                                                ⁡                                                                  (                                                                      m                                    ,                                    𝕆                                                                    )                                                                                                                      )                                                                          }                                                                              )                                                                                                        =                              ln                ⁡                                  (                                                            ∑                                              m                        =                        1                                            M                                        ⁢                                          exp                      ⁡                                              (                                                                              B                            ′                                                    ⁡                                                      (                                                          m                              ,                              𝕆                                                        )                                                                          )                                                                              )                                                                                        (        4        )            
In equation (4), after the weighted logarithmic output probability value of each distribution is obtained, the power calculation and the natural logarithmic operation are still necessary to calculate the output probability of the mixture distribution. To simplify the above calculations, the approximation methods of the output probability calculation are disclosed in H. Ney et al., “Phoneme modeling using continuous mixture densities”, Proc. ICASSP88, pp. 437-440, 1988 (to be referred to as Ney hereinafter) and Japanese Patent No. 2983364.
In Ney, the output probability of the mixture distribution is approximated using the maximum output probability of the output probabilities of the respective distributions instead of calculating the sum of the output probabilities of the respective distributions, thereby reducing the calculation amount. That is, in place of equation (2), the output probability of the mixture distribution is calculated by:
                              b          ⁡                      (            𝕆            )                          =                              max                          m              =              1                        M                    ⁢                                    w              ⁡                              (                m                )                                      ·                                          b                ′                            ⁡                              (                                  m                  ,                  𝕆                                )                                                                        (        5        )            
Japanese Patent No. 2983364 discloses an example in which a technique as in Ney is applied to an arc-emission HMM (Mealy machine).
When the approximation of equation (5) is used for equation (4), the logarithmic output probability of the mixture distribution can be simplified as:
                                                                        B                ⁡                                  (                  𝕆                  )                                            =                              ln                ⁡                                  (                                      b                    ⁡                                          (                      𝕆                      )                                                        )                                                                                                        =                              ln                ⁡                                  (                                                            max                                              m                        =                        1                                            M                                        ⁢                                                                  w                        ⁢                                                  (                          m                          )                                                                    ·                                                                        b                          ′                                                ⁡                                                  (                                                      m                            ,                            𝕆                                                    )                                                                                                      )                                                                                                        =                                                max                                      m                    =                    1                                    M                                ⁢                                  ln                  ⁡                                      (                                                                  w                        ⁡                                                  (                          m                          )                                                                    ·                                                                        b                          ′                                                ⁡                                                  (                                                      m                            ,                            𝕆                                                    )                                                                                      )                                                                                                                          =                                                max                                      m                    =                    1                                    M                                ⁢                                                      B                    ′                                    ⁡                                      (                                          m                      ,                      𝕆                                        )                                                                                                          (        6        )            
The above conventional technique is excellent because the computation cost is reduced by the approximation while degradation in recognition accuracy by the approximation errors is little. However, output probability calculation of the mixture distribution HMM still requires a large amount of calculation cost.