Discriminative training has been a prominent theme in recent speech recognition research and system development. The essence of these discriminative training algorithms is the adoption of various cost functions that are directly or indirectly related to the empirical error rate found in the training data. These cost functions serve the objective functions for optimization, and for the related empirical error rate that may either be calculated at the sentence string level, at the super-string level, at the sub-string level, or at the isolated word/phone token level.
For example, one approach that has been found during research is that when the empirical training error rate is optimized through the use of a classifier or recognizer, only a biased estimate of the true error rate is obtained. The size of this bias depends on the complexity of the recognizer and the task (as quantified by the VC dimension). Analysis and experimental results have shown that this bias can be quite substantial even for a simple Hidden Markov Model recognizer applied to a simple single digit recognition task. Another key insight from the machine learning research suggests that one effective way to reduce this bias and improving generalization performance is to increase “margins” in the training data. That is, making the correct samples classified well away from the decision boundary. Thus, it is desirable to use such large margins for achieving lower test errors even if this may result in higher empirical errors in training. Previous approaches to discriminative learning techniques and speech recognition have focused on the issue of the empirical error rate. These have not focused on the issue of margins or the related generalization.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.