1. Field of the Invention
The present invention relates to speech recognition in general and, in particular, to estimating mixture Gaussian densities of speech-unit models for hidden Markov model (HMM) based speech recognition systems.
2. Description of Related Art
In speech recognition systems, particularly in hidden Markov model based speech recognition systems, a training module which generates probabilistic models of speech units is a very important component. Its functionality affects the system recognition performance significantly. Among the probabilistic models of speech units, mixture Gaussian density models have been successfully used in models of word unit or phoneme-sized unit for tasks such as isolated word recognition, as well as continuous speech recognition. A mixture Gaussian density consists of a plurality of Gaussian densities, (.mu..sub.i, C.sub.i), i=1, . . . K, and a plurality of weights for each Gaussian density, .alpha..sub.i, i=1, . . . K, where .alpha..sub.i .gtoreq.0 and ##EQU1## In a training module of a speech recognition system, the parameters of the Gaussian densities and the weights are estimated from the training speech data. The existing techniques of estimating parameters of mixture Gaussian density of models of speech units are primarily the extension of the Baum-Welsh algorithm, see B. H. Juang et al., "Mixture Autoregressive Hidden Markov Models for Speech Signals," IEEE Trans. ASSP, ASSP-33, pp. 1404-1413, and the segmental K-means algorithm, see L. R. Rabiner et al., "A Segmental K-means Training Procedure for Connected Word Recognition," AT&T Technical Journal, Vol. 65(3), pp. 21-31, which have been used successfully in some speech recognition systems. These techniques start from a chosen number of mixture components, and chosen initial parameters for each Gaussian density, and then iteratively improve the parameter estimates through likelihood maximization or distortion minimization. The likelihood or distortion is computed from frame-based scores of speech features, and the parameter estimates of a mixture Gaussian density are dependent on the initial choice of the number of mixture components, as well as the initial parameters of each Gaussian density.
The present invention provides a training module for speech recognition systems with a new technique for estimating the parameters of mixture Gaussian densities for models of speech units. The advantages of this technique will become readily apparent upon considering the present invention.