The present invention generally relates to the technology of speech recognition and, more particularly, to a parametric family of multivariate density functions formed by mixture models from univariate functions for modeling acoustic feature vectors used in automatic recognition of speech.
Most pattern recognition problems require the modeling probability density of feature vectors in feature space. Specifically, in the problem of speech recognition, it is necessary to model the probability density of acoustic feature vectors in the space of phonetic units. Purely Gaussian densities have been known to be inadequate for this purpose due to the heavy tailed distributions observed by speech feature vectors. See, for example, Frederick Jelenik, Statistical Methods for Speech Recognition, MIT Press (1997). As an intended remedy to this problem, practically all speech recognition systems attempt modeling by using a mixture model with Gaussian densities for mixture components. Variants of the standard K-means clustering algorithm are used for this purpose. The classical version the K-means algorithm as described by John Hartigan in Clustering Algorithms, John Wiley and Sons (1975), and Anil Jain and Richard Dubes in Algorithms for Clustering Data, Prentice Hall (1988), can also be viewed as a special case of the expectation-maximization (EM) algorithm (see A. P. Dempster, N. M. Laird and D. B. Baum, xe2x80x9cMaximum likelihood from incomplete data via the EM algorithmxe2x80x9d, Journal of Royal Statistical Soc., Ser. B, vol., 39, pp. 1-38, 1997) for mixtures of Gaussians with variances tending to zero. See also Christopher M. Bishop, Neural Networks for Pattern Recognition, Cambridge University Press (1997), and F. Marroquin and J. Girosi, xe2x80x9cSome extensions of the K-means algorithm for image segmentation and pattern classificationxe2x80x9d, MIT Artificial Intelligence Lab. A. I. Memorandum no. 1390, January 1993. The only attempt to model the phonetic units in speech with non-Gaussian mixture densities is described by H. Ney and A. Noll in xe2x80x9cPhoneme modeling using continuous mixture densitiesxe2x80x9d, Proceedings of IEEE Int. Conf on Acoustics Speech and Signal Processing, pp. 437-440, 1998, where Laplacian densities were used in a heuristic base estimation algorithm.
S. Basu and C. A. Micchelli in xe2x80x9cParametric density estimation for the classification of acoustic feature vectors in speech recognitionxe2x80x9d, Nonlinear Modeling: Advanced Black-Box Techniques (Eds. J. A. K. Suykens and J. Vandewalle), pp. 87-118, Kluwer Academic Publishers, Boston (1998), attempted to model speech data by building probability densities from a given univariate function h(t) for txe2x89xa70. Specifically, Basu and Micchelli considered mixture models from component densities of the form                               p          ⁡                      (                                          x                ❘                u                            ,              ∑                        )                          =                              ρ            d                    ⁢                      xe2x80x83                    ⁢                      1                                          det                ⁢                                  xe2x80x83                                ∑                                              ⁢                      xe2x80x83                    ⁢          exp          ⁢                      xe2x80x83                    ⁢                      (                                          -                                  (                                      h                    ⁡                                          (                                              Q                        ⁡                                                  (                          x                          )                                                                    )                                                        )                                            ,                              x                ∈                                                      R                    d                                    ⁢                                      xe2x80x83                                    ⁢                  where                                                                                        (        1        )                                                      Q            ⁡                          (              x              )                                =                                                                      γ                  d                                ⁡                                  (                                      x                    -                    μ                                    )                                            t                        ⁢                                          ∑                                  -                  1                                            ⁢                              (                                  x                  -                  u                                )                                                    ,                  x          ∈                      R            d                          ,                            (        2        )                                                      m            β                    =                                    ∫                              R                +                                      ⁢                                          t                β                            ⁢                              f                ⁡                                  (                  t                  )                                            ⁢                              xe2x80x83                            ⁢                              ⅆ                t                                                    ,                            (        3        )            
(when the integral is finite and R+ denotes the positive real axis)                                           ρ            d                    =                                                    Γ                ⁡                                  (                                      d                    2                                    )                                            ⁢                                                (                                      m                                          d                      2                                                        )                                                  d                  2                                                                                                      π                                      d                    2                                                  ⁡                                  (                                      m                                                                  d                        2                                            -                      1                                                        )                                                                              d                  2                                +                1                                                    ,        and                            (        4        )                                          γ          d                =                                            m                              d                2                                                    d              ⁢                              xe2x80x83                            ⁢                              m                                                      d                    2                                    -                  1                                                              .                                    (        5        )            
If the constraints xcfx81d and xcex3d are positive and finite, then the vector xcexc∈Rd and the positive definite symmetric dxc3x97d matrix xcexa3 are the mean and the covariance of this density. Particular attention was given to the choice h(t)=txcex1/2, t greater than 0, xcex1 greater than 0; the case xcex1=2 corresponds to the Gaussian density, whereas the Laplacian case considered by H. Ney and A. Noll, supra, corresponds to xcex1=1. Smaller values of xcex1 correspond to more peaked distributions (xcex1xe2x86x920 yields the xcex4 function), whereas larger values of xcex1 correspond to distributions with flat tops (xcex1xe2x86x92∞ yields the uniform distribution over elliptical regions). For more details about these issues see S. Basu and C. Micchelli, supra. This particular choice of densities has been studied in the literature and referred to in various ways; e.g., xcex1-stable densities as well as power exponential distributions. See, for example, E. Gòmez, M. A. Gòmez-Villegas, and J. M. Marin, xe2x80x9cA multivariate generalization of the power exponential family of distributionsxe2x80x9d, Comm. Stat.xe2x80x94Theory Meth. 17(3), pp. 589-600, 1998, and Owen Kenny, Douglas Nelson, John Bodenschatz and Heather A. McMonagle, xe2x80x9cSeparation of nonspontaneous and spontaneous speechxe2x80x9d, Proc. ICASSP, 1998.
In S. Basu and C. Micchelli, supra, an iterative algorithm having the expectation-maximization (EM) flavor for estimating the parameters was obtained and used for a range of fixed values of xcex1 (as opposed to the choice of xcex1=1 in H. Ney and A. Noll, supra, and xcex1=2 in standard speech recognition systems). A preliminary conclusion from the study in S. Basu and C. Micchelli was that the distribution of speech feature vectors in the acoustic space are better modeled by mixture models with non-Gaussian mixture components corresponding to xcex1 less than 1. As a consequence of these encouraging results, we became interested in automatically finding the xe2x80x9cbestxe2x80x9d value of xcex1 directly from the data. It is this issue that is the subject of the present invention.
It is therefore an object of the present invention to provide a parametric family of multivariate density functions formed by mixture models from univariate functions of the type exp(xe2x88x92|x|xcex2) for modeling acoustic feature vectors used in automatic recognition of speech.
According to the invention, the parameter xcex2 is used to measure the non-Gaussian nature of the data. In the practice of the invention, xcex2 is estimated from the data using a maximum likelihood criterion. Among other things, there is a balance between xcex2 and the number of data points N that must be satisfied for efficient estimation. The computer implemented method for automatic machine recognition of speech iteratively refines parameter estimates of densities comprising mixtures of power exponential distributions whose parameters are means (xcexc), variances ("sgr"), impulsivity numbers (xcex1) and weights (w). The iterative refining process begins by predetermining initial values of the parameters xcexc, "sgr" and w. Then, {circumflex over (xcexc)}l, {circumflex over ("sgr")}l derived from the following equations             μ      i      l        =                                        ∑                          k              =              1                        N                    ⁢                                                    (                                                      ∑                                          j                      =                      1                                        d                                    ⁢                                                                                    (                                                                              x                            j                            k                                                    -                                                                                    μ                              ^                                                        j                            l                                                                          )                                            2                                                                                      σ                        ^                                            j                      l                                                                      )                                                                                                        α                      ^                                        l                                    /                  2                                -                1                                      ⁢                          A              lk                        ⁢                          x              i              k                                                            ∑                          k              =              1                        N                    ⁢                                                    (                                                      ∑                                          j                      =                      1                                        d                                    ⁢                                                                                    (                                                                              x                            j                            k                                                    -                                                                                    μ                              ^                                                        j                            l                                                                          )                                            2                                                                                      σ                        ^                                            j                      l                                                                      )                                                                                                        α                      ^                                        l                                    /                  2                                -                1                                      ⁢                          A              lk                                          ⁢              xe2x80x83            ⁢      and                          σ        i        l            =                                                  α              ^                        l                    ⁢                                                    γ                d                            ⁡                              (                                                      α                    ^                                    l                                )                                                                                      α                  ^                                l                            /              2                                ⁢                                    ∑                              k                =                1                            N                        ⁢                                                            (                                                            ∑                                              j                        =                        1                                            d                                        ⁢                                                                                            (                                                                                    x                              j                              k                                                        -                                                                                          μ                                ^                                                            j                              l                                                                                )                                                2                                                                                              σ                          ^                                                j                        l                                                                              )                                                                                                                    α                        ^                                            l                                        /                    2                                    -                  1                                            ⁢                                                                    A                    lk                                    ⁡                                      (                                                                  x                        i                        k                                            -                                                                        μ                          ^                                                i                        l                                                              )                                                  2                                                              A          l                      ⁢          xe2x80x83      
for i=1, . . . ,d and l=1, . . . ,m. Then "sgr" is updated by assuming that xcex8=(xcexc,"sgr",xcex1), {circumflex over (xcex8)}=({circumflex over (xcexc)},{circumflex over ("sgr")},{circumflex over (xcex1)}) and letting H(xcexc,"sgr")=E{circumflex over (xcex8)}(log f(xc2x7|xcex8)), in which case H has a unique global maximum at xcexc={circumflex over (xcexc)}, "sgr"={circumflex over ("sgr")} where       β    ⁡          (              α        ,                  α          ^                    )        =                    {                              α            ⁢                          xe2x80x83                        ⁢            Γ            ⁢                          xe2x80x83                        ⁢                          (                                                α                  +                  1                                                  α                  ^                                            )                                            Γ            ⁢                          xe2x80x83                        ⁢                          (                              1                                  α                  ^                                            )                                      }                    2        α              ⁢                  Γ        ⁢                  xe2x80x83                ⁢                  (                      3            α                    )                ⁢        Γ        ⁢                  xe2x80x83                ⁢                  (                      1                          α              ^                                )                            Γ        ⁢                  xe2x80x83                ⁢                  (                      3                          α              ^                                )                ⁢        Γ        ⁢                  xe2x80x83                ⁢                  (                      1            α                    )                    
The l dimension is set by xcexcl={circumflex over (xcexc)}l, "sgr"l={circumflex over ("sgr")}l, and xcex1l={circumflex over (xcex1)}l. Finally, the convergence of a log likelihood function B(xcex1) of the parameters is determined in order to get final values of xcexc, "sgr" and xcex1. The B(xcex1) is       B    ⁡          (              Λ        ,                  w          ^                ,                  Λ          ^                    )        =            ∑              l        =        1            m        ⁢          xe2x80x83        ⁢                  B        l            ⁡              (                  Λ          ,                      w            ^                    ,                      Λ            ^                          )            
where             B      l        ⁡          (              Λ        ,                  w          ^                ,                  Λ          ^                    )        =            ∑              k        =        1            N        ⁢          xe2x80x83        ⁢                            A          lk                ⁡                  (                                                    1                2                            ⁢                              (                                                      ∑                                          i                      =                      1                                        d                                    ⁢                                      xe2x80x83                                    ⁢                                      log                    ⁢                                          xe2x80x83                                        ⁢                                          σ                      i                      l                                                                      )                                      +                          log              ⁢                              xe2x80x83                            ⁢                                                ρ                  d                                ⁡                                  (                                      α                    l                                    )                                                      -                                                            (                                                            γ                      d                                        ⁡                                          (                                              α                        l                                            )                                                        )                                                                      α                    l                                    /                  2                                            ⁢                                                (                                                            ∑                                              i                        =                        1                                            d                                        ⁢                                          xe2x80x83                                        ⁢                                                                                            (                                                                                    x                              i                              k                                                        -                                                          μ                              i                              l                                                                                )                                                2                                                                    σ                        i                        l                                                                              )                                                                      α                    l                                    /                  2                                                              )                    .      