1. Technical Field
The present invention generally relates to high dimensional data and, in particular, to methods for mining and visualizing high dimensional data through Gaussianization.
2. Background Description
Density Estimation in high dimensions is very challenging due to the so-called xe2x80x9ccurse of dimensionalityxe2x80x9d. That is, in high dimensions, data samples are often sparsely distributed. Thus, density estimation requires very large neighborhoods to achieve sufficient counts. However, such large neighborhoods could cause neighborhood-based techniques, such as kernel methods and nearest neighbor methods, to be highly biased.
The exploratory projection pursuit density estimation algorithm (hereinafter also referred to as the xe2x80x9cexploratory projection pursuitxe2x80x9d) attempts to overcome the curse of dimensionality by constructing high dimensional densities via a sequence of univariate density estimates. At each iteration, one finds the most non-Gaussian projection of the current data, and transforms that direction to univariate Gaussian. The exploratory projection pursuit is described by J. H. Friedman, in xe2x80x9cExploratory Projection Pursuitxe2x80x9d, J. American Statistical Association, Vol. 82, No. 397, pp. 249-66, 1987.
Recently, independent component analysis has attracted a considerable amount of attention. Independent component analysis attempts to recover the unobserved independent sources from linearly mixed observations. This seemingly difficult problem can be solved by an information maximization approach that utilizes only the independence assumption on the sources. Independent component analysis can be applied for source recovery in digital communication and in the xe2x80x9ccocktail partyxe2x80x9d problem. A review of the current status of independent component analysis is described by Bell et al., in xe2x80x9cA Unifying Information-Theoretic Framework for Independent Component Analysisxe2x80x9d, International Journal on Mathematics and Computer Modeling, 1999. Independent component analysis has been posed as a parametric probabilistic model, and a maximum likelihood EM algorithm has been derived, by H. Attias, in xe2x80x9cIndependent Factor Analysisxe2x80x9d, Neural Computation, Vol. 11, pp. 803-51, May 1999.
Parametric density models, in particular Gaussian mixture density models, are the most widely applied models in large scale high dimensional density estimation because they offer decent performance with a relatively small number of parameters. In fact, to limit the number of parameters in large tasks such as automatic speech recognition, one assumes only mixtures of Gaussians with diagonal covariances. There are standard EM algorithms to estimate the mixture coefficients and the Gaussian means and covariance. However, in real applications, these parametric assumptions are often violated, and the resulting parametric density estimates can be highly biased. For example, mixtures of diagonal Gaussians       p    ⁡          (      χ      )        =            ∑              i        =        1            1        ⁢          xe2x80x83        ⁢                  π        i            ⁢              G        ⁡                  (                      χ            ,                          μ              i                        ,                          Σ              i                                )                    
roughly assume that the data is clustered, and within each cluster the dimensions are independent and Gaussian distributed. However, in practice, the dimensions are often correlated within each cluster. This leads to the need for modeling the covariance of each mixture component. The following xe2x80x9csemi-tiedxe2x80x9d covariance has been proposed:
xcexa3i=Axcex9iAT
where A is shared and for each component, xcex9i is diagonal. This semi-tied co-variance is described by: M. J. F. Gales, in xe2x80x9cSemi-tied Covariance Matrices for Hidden Markov Modelsxe2x80x9d, IEEE Transactions Speech and Audio Processing, Vol. 7, pp. 272-81, May 1999; and R. A. Gopinath, in xe2x80x9cConstrained Maximum Likelihood Modeling with Gaussian Distributionsxe2x80x9d, Proc. of DARPA Speech Recognition Workshop, February 8-11, Lansdowne, Va., 1998. Semi-tied covariance has been reported in the immediately preceding two articles to significantly improve the performance of large vocabulary continuous speech recognition systems. It should be appreciated that a compound Gaussian is no longer a diagonal Gaussian.
Accordingly, there is a need for a method that transforms high dimensional data into a standard Gaussian distribution which is computationally efficient.
The present invention is directed to high dimensional acoustic modeling via mixtures of compound Gaussians with linear transforms. In addition to providing a novel density model within an acoustic model, the present invention also provides an iterative expectation maximization (EM) method which estimates the parameters of the mixtures of the density model as well as of the linear transform.
According to a first aspect of the invention, a method is provided for generating a high dimensional density model within an acoustic model for one of a speech and a speaker recognition system. The density model has a plurality of components, each component having a plurality of coordinates corresponding to a feature space. The method includes the step of transforming acoustic data obtained from at least one speaker into high dimensional feature vectors. The density model is formed to model the feature vectors by a mixture of compound Gaussians with a linear transform. Each compound Gaussian is associated with a compound Gaussian prior and models each of the coordinates of each of the components of the density model independently by a univariate Gaussian mixture including a univariate Gaussian prior, variance, and mean.
According to a second aspect of the invention, the method further includes the step of applying an iterative expectation maximization (EM) method to the feature vectors, to estimate the linear transform, the compound Gaussian priors, and the univariate Gaussian priors, variances, and means.
According to a third aspect of the invention, the EM method includes the step of computing an auxiliary function Q of the EM method. The compound Gaussian priors and the univariate Gaussian priors are respectively updated, to maximize the auxiliary function Q. The univariate Gaussian variances, the linear transform, and the univariate Gaussian means are respectively updated to maximize the auxiliary function Q, the linear transform being updated row by row. The second updating step is repeated, until the auxiliary function Q converges to a local maximum. The computing step and the second updating step are repeated, until a log likelihood of the feature vectors converges to a local maximum.
According to a fourth aspect of the invention, the method further includes the step of updating the density model to model the feature vectors by the mixture of compound Gaussians with the updated linear transform. Each of the compound Gaussians is associated with one of the updated compound Gaussian priors and models each of the coordinates of each of the components independently by the univariate Gaussian mixtures including the updated univariate Gaussian priors, variances, and means.
According to a fifth aspect of the invention, the linear transform is fixed, when the univariate Gaussian variances are updated.
According to a sixth aspect of the invention, the univariate Gaussian variances are fixed, when the linear transform is updated.
According to a seventh aspect of the invention, the linear transform is fixed, when the univariate Gaussian means are updated.