1. Field of the Invention
The present invention generally relates to speech recognition systems and, more particularly, to the use of EM type algorithms for the estimation of parameters for a mixture model of nongaussian densities. The present invention was motivated by two objectives. The first was to study maximum likelihood density estimation methods for high dimensional data, and the second was the application of the techniques developed in large vocabulary continuous parameter speech recognition.
2. Background Description
Speech recognition systems requires modeling the probability density of feature vectors in the acoustic space of phonetic units. Purely gaussian densities have been know to be inadequate for this purpose due to the heavy tailed distributions observed by speech feature vectors. Se,, for example, Frederick Jelenik, Statistical Methods for Speech Recognition, MIT Press, 1997. As an intended remedy to this problem, practically all speech recognition systems attempt modeling by using a mixture model with gaussian densities for mixture components. Variants of the standard K-mean clustering algorithm are used for this purpose. The classical version (as described by John Hartigan in Clustering Algorithms, John Wiley & Sonse, 1975, and Anil Jain and Richard Dubes in Algorithms for Clustering Data, Prentice Hall, 1988) of the K-means algorithm can also be viewed as an special case of the EM algorithm (as described by A. P. Dempster, N. M. Laird and D. B. Baum in "Maximum likelihood from incomplete data via the EM algorithm", Journal of Royal Statistical Soc., Ser. B, vol. 39, pp. 1-38, 1997) in the limiting case of gaussian density estimation with variance zero. See, for example, Christopher M. Bishop, Neutral Networks for Pattern Recognition, Cambridge University Press, 1997, and F. Marroquin and J. Girosi, "Some extensions of the K-means algorithm for image segmentation and pattern classification", MIT Artificial Intelligence Lab. A. I. Memorandum no. 1390, January 1993. The only known attempt to model the phonetic units in speech with nongaussian mixture densities is described by H. Ney and A. Noll in "Phoneme modeling using continuous mixture densities", Proceedings of IEEE Int. Conf. on Acoustics Speech and Signal Processing, pp. 437-440, 1988, where laplacian densities were used in a heuristic based estimation algorithm.