The discussion below is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
A pattern recognition system, such as a speech recognition system or a handwriting recognition system, takes an input signal and attempts to decode the signal to find a pattern represented by the signal. For example, in a speech recognition system, a speech signal (often referred to as a test signal) is received by the recognition system and is decoded to identify a string of words represented by the speech signal.
Many pattern recognition systems need to build models to parameterize each pattern unit. These units can be phonemes or words for speech recognition and characters for handwriting recognition. Hidden Markov Models (HMM) are widely used for pattern recognition systems in which patterns have time sequence information. In HMM, each pattern contains several states and several arcs among these states. During the training stage, probability distributions for occupying the states and for transitioning between states are determined for each of the units. During the decoding stage, a signal is compared to the distributions for the states to identify a most likely sequence of HMM states that can be represented by the input signal.
In HMM, state distributions are often approximated by mixtures of Gaussian distributions. Each Gaussian distribution component is determined by mean vector and covariance matrix. There are generally two approaches for estimating covariance matrix: diagonal covariance matrix and full covariance matrix.
Full covariance matrix models correlations between feature components of the model, which improves pattern recognition accuracy. However, for large scale pattern recognition such as speech recognition and handwriting recognition, the number of full covariance parameters is several orders of magnitude greater than the number of mean parameters. The dramatic increase of parameters will cause the full covariance to not be reliably estimated, or even singular. A singular covariance matrix can not be used for pattern recognition because a likelihood calculation uses the inverse of a covariance matrix.
Diagonal covariance is currently widely used, but, in this case, the feature components are assumed to be independent with each other. This assumption can lead to reduced pattern recognition accuracy. Accordingly, both diagonal covariance matrix and full covariance matrix has its own defect.