(1). Field of the Invention
The present invention relates to classification systems and more specifically to a classifier which combines the information in training and test data to infer about the true symbol probabilities prior to making a classification decision.
(2) Description of the Prior Art
The use of classification systems to classify input data into one of several predetermined classes is well known. Their use has been adapted to a wide range applications including target identification, medical diagnosis, speech recognition, digital communications and quality control systems.
Classification systems decide, given an input X, to which of several output classes X belongs. If known, measurable characteristics separate classes, the classification decision is straightforward. However, for most applications, such characteristics are unknown, and the classification system must decide which output class the input most closely resembles. In such applications, the output classes and their characteristics are modeled (estimated) using statistics for the classes derived from training data belonging to known classes. Thus, the standard classification approach is to first estimate the statistics from the given training data and then to apply a decision rule using these estimated statistics.
However, often there is insufficient training data to accurately infer the true statistics for the output classes which results in reduced classification performance or more occurrences of classification errors. Additionally, any new information that arrives with the input data is not combined with the training data to improve the estimates of the symbol probabilities. Furthermore, changes in symbol probabilities resulting from changes, which may be unobservable, in the source of test data, the sensors gathering data or the environment often result in reduced classification performance. Therefore, if based on the training data a classification system maintains a near zero probability for the occurrence of a symbol and the symbol begins to occur in the input data with increasing frequency, classification errors are likely to occur if the new data is not used in determining symbol probabilities.
Attempts to improve the classification performance and take advantage of information available in test data have explored combining the test data with the training data in modeling class statistics and making classification decisions. While these attempts have indicated that improved classification performance is possible, they have one or more drawbacks which limit or prevent their use for many classification systems.
One early approach to combining the training and test data to estimate class statistics is described in A. Nadas, "Optimal Solution of a Training Problem in Speech Recognition," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, no. 1 (1985), pp. 326-329. In Nadas, the input (test) data which comprised a sample to be classified was combined with the training data to obtain an estimate of the probability distribution for each of the classes. However, the result in Nadas showed that combining the test sample with the training data did not provide improved performance but resulted in classification decision based on a standard general likelihood ratio test.
A second approach to combining the training data with test data is found in Merhav et al, "A Bayesian Classification Approach with Application to Speech Recognition," IEEE Trans. Signal Processing, vol. 39, no. 10 (1991) pp. 2157-2166. In Merhav et al classification decision rules which depend on the available training and test data were explored. A first decision rule which is a Bayesian rule was identified. However, this classification rule was not fully developed or evaluated because the implementation and evaluation of the probability density functions required are extremely complex.
The second classification rule is based on generalized likeihood ratios. While this rule was shown to provide improved classification performance, it suffers from several drawbacks. The decision rule requires a training algorithm based on hidden Markov models be evaluated for each output class for every test data vector received to estimate the probability functions. This training requirement is computationally intensive and typically requires a significant amount of time to converge. Thus, the rule would be relatively complex and difficult to implement, particularly for applications in real time. Furthermore, such a system would be costly and require a relatively large amount of space. Additionally, because the rule relies on hidden Markov model in estimating probability distributions, it is not readily adapted to different classification applications.
Thus, what is needed is a classification system which can be easily and readily implemented, which is readily adaptable to various applications and which uses all the available data including the information in the training data and test data to estimate the true symbol probabilities prior to making a classification decision.