1. Field of the Invention
The present invention generally relates to improvements in decoder accuracy in pattern recognition and, more particularly, to a method for exposing a pattern recognition decoder to new training data without losing previously learned data. The method disclosed is applicable in areas of pattern recognition such as speech, handwriting and image recognition, machine translation, natural language processing and the like.
2. Background Description
Automated pattern recognition is a difficult task. For example, while there are many techniques for recognizing speech patterns being studied today, the so-called Hidden Markov Modeling (HMM) proves promising. "Hidden" refers to the probabilistic and not directly observable events which underlie a speech signal. There have been many variations of HMM proposed. HMM speech recognition systems typically use realizations of phonemes which are statistical models of phonetic segments, including allophones (phones) having parameters that are estimated from a set of training examples. Models of words are made by chaining or linking appropriate phone models. Recognition consists of finding the most likely path through the set of word models for the input speech signal.
HMM speech recognition decoding systems first need to be trained through an iterative process. That is, the system must be repeatedly exposed to training examples or words of a particular speaker's voice. A training word is analyzed to generate a framed sequence of acoustic parameter vectors or statistical models. A valid or "good" recognition occurs when the most likely path through the set of word models for the training word results in recognizing the correct word (i.e., the training word itself).
Unfortunately, when a decoding system is being trained and is exposed to training data at some iterative stage, it may lose several good properties that were acquired in previous training stages. The maximum likelihood (or mutual information) of the parameters may not lead to values which maximize recognition accuracy. Alternative error corrective estimation procedures are known which aim to minimize the number of recognition errors. Such procedures imitate error correction procedures for linear classifiers where adjustments are made unless the log probability for the correct word exceeds the log probability of all other words by some threshold. This procedure is lacking a rigorous foundation and, in particular, it does not provide hill-climbing algorithms. This procedure also does not guarantee that other recognition errors will not be introduced while some particular "near miss" errors are being corrected.