1. Field of the Invention
The present invention generally relates to a method and apparatus for pattern recognition, and more particularly to a method and apparatus for estimating parameters in statistical models that represent patterns using optimization criteria.
2. Description of the Related Art
Conventional pattern recognition algorithms fall short in cases that involve huge data sources or large number of modeling parameters, such as large vocabulary continuous speech recognition or translation, enterprise applications, such as data mining and business intelligence, weather prediction, processing satellite data to predict locations of traffic jams, prediction of trends in financial markets, etc. Pattern recognition requires estimating parameters in statistical models that represent patterns through some optimization criteria (e.g., Maximum Likelihood (ML) or Maximum Mutual Information Estimation (MMI) function with Gaussian mixture parameters). The term “huge data” pattern recognition denotes a process that operates with a large number of modeling parameters (order of several millions) or process large data sets (e.g., several hundred millions of words in textual corpuses). Pattern recognition presents challenging optimization requirements that are not fully resolved by conventional optimization techniques, for the following reasons.
First, optimization methods that involve the Hessian matrix are computationally inefficient when the data size or number of model parameters is very large (e.g., several million parameters).
Second, optimization criteria for estimating parameters in pattern recognition, in general, are far from perfect. For example, maximum likelihood criteria usually do not work well if the training data to estimate the parameters in models do not represent all possible variations in patterns. Accordingly, certain conventional techniques have introduced discrimination criteria, such as Maximum Mutual Information Estimation (MMI), for training. The MMI discrimination criteria can be efficiently optimized via Extended-Baum-Welch (EBW) transformations for discrete probability parameters and Gaussian parameters
Third, a conventional optimization technique exists that uses expectation-maximization (EM) estimation methodology. This technique involves an iterative process in which an original objective function is replaced with a computed auxiliary function (E-step) at each iterative step. After this auxiliary function computed it is optimized (M-step). Usually this optimization process can be represented in a closed form solution. This is applicable only to narrow classes of criteria as the maximum likelihood criteria. This process is important as a modeling/estimation statistical tool since it allows a user to make assumptions about incomplete observed data (namely, introduce a hidden data and a latent variable that describes this hidden data).
Several problems, however, remain with the above conventional approaches.
First, fast optimization can easily lead to overtraining and degradation of pattern recognition accuracy.
Second, there exist processes that are not modeled as Gaussian and, therefore, EBW transformations could not be used to optimize the MMI discriminative criteria.
Third, the EM concept as a modeling tool is applicable only to ML type of functions and is not applicable to general discrimination functions of non-Gaussians parameters. This raises a problem in creating an optimization process and generalize estimation-maximization process to estimate parameters for a large class of objective (discriminative) criteria for processes that are not modeled as Gaussians.