(1) Field of the Invention
The present invention relates to system and more specifically to a classification technique which combines the information in training data (of known characteristics) and test data to infer about the true symbol probabilities prior to making a classification decision. In particular, it is related to an automatic feature selection system for data (training of known classes and input data of an known classes) containing missing values.
(2) Description of the Prior Art
The use of classification systems to classify input data into one of several predetermined classes is well known. Their use has been adapted to a wide range applications including target identification as a threat and non-threat conditions, medical diagnosis, speech recognition, digital communications and quality control systems.
For a given input X, classification systems decide to which of several output classes does the input X belong. If known, measurable characteristics separate classes, the classification decision is straightforward. However, for most applications, such characteristics are unknown, and the classification system must decide which output class does the input X most closely resemble. In such applications, the output classes and their characteristics are modeled (estimated) using statistics for the classes derived from training data belonging to known classes. Thus, the standard classification approach is to first estimate the statistics from the given training data belonging to known classes and then to apply a decision rule using these estimated or modeled statistics.
However, often there is insufficient training data belonging to known classes i.e., having known characteristics to accurately infer the true statistics for the output classes which results in reduced classification performance or more occurrences of classification errors. Additionally, any new information that arrives with the input data is not combined with the training data to improve the estimates of the symbol probabilities. Furthermore, changes in symbol probabilities resulting from unobservable changes in the source of test data, the sensors gathering data and the environment often result in reduced classification performance. Therefore, if based on the training data a classification system maintains a near zero probability for the occurrence of a symbol and the symbol begins to occur in the input data with increasing frequency, classification errors are likely to occur if the new data is not used in determining symbol probabilities.
Attempts to improve the classification performance and take advantage of information available in test data have involved combining the test data with the training data in modeling class statistics and making classification decisions. While these attempts have indicated that improved classification performance is possible, they have one or more drawbacks which limit or prevent their use for many classification systems.
One early approach to combining the training and test data to estimate class statistics is described in A. Nxc3xa6das, xe2x80x9cOptimal Solution of a Training Problem in Speech Recognition,xe2x80x9d IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, no. 1 (1985), pp. 326-329. In Nxc3xa6das, the input (test) data which comprised a sample to be classified was combined with the training data to obtain an estimate of the probability distribution for each of the classes. However, the result in Nxc3xa6das showed that combining the test sample with the training data did not provide improved performance but resulted in classification decision based on a standard general likelihood ratio test.
It is known in prior art artificial intelligence systems to reduce data complexity by grouping data into worlds with shared similar attributes. This grouping of the data helps separate relevant data from redundant data using a co-exclusion technique. These methods search saved data for events that do not happen at the same time. This results in a memory saving for the systems because only the occurrence of the event must be recorded. The co-exclusive event can be assumed.
Bayesian networks, also known as belief networks are known in the art for use as filtering systems. The belief network is initially learned by the system from data provided by an expert, user data and user preference data. The belief network is relearned when additional attributes are identified having an effect. The belief network can then be accessed to predict the effect.
A method for reducing redundant features from training data is needed for reducing the training times required for a neural network and providing a system that does not require long training times or a randomized starting configuration.
Thus, what is needed is a classification system which can be easily and readily implemented, and is readily adaptable to various applications and which uses all the available data including the information in the training data and test data to estimate the true symbol probabilities prior to making a classification decision.
Accordingly, it is a general purpose and object of the present invention to provide a classifier which uses the information in the training and test data to estimate the true symbol probabilities wherein either the test data or the training data or both have missing values in it.
Another object of the present invention is to provide a classification system and method which uses quantized training data and test data with missing values therein to re-estimate symbol probabilities before each classification decision.
Yet another object of the present invention is the provision of a classification system which depends only on the available training data and test data with missing values therein and is readily implemented and easily adapted to a variety of classification applications.
It is a further object of the present invention to provide a combined classification system which combines the test data having missing values and the training data to simultaneously estimate the symbol probabilities for all output classes and classify the test data.
These and other objects made apparent hereinafter are accomplished with the present invention by providing a combined classification system which combines the information available in the training data and test data having missing values to estimate (or model).
This invention thus provides another object of the invention is that such classification system should not include redundant and ineffectual data.
A further object of the invention is to provide a method for reducing feature vectors to only those values which affect the outcome of the classification.
Accordingly, this invention provides a data reduction method for a classification system using quantized feature vectors for each class with a plurality of features and levels. The reduction algorithm consisting of applying a Bayesian data reduction algorithm to the classification system for developing reduced feature vectors. Test data is then quantified into the reduced feature vectors. The reduced classification system is then tested using the quantized test data.
A Bayesian data reduction algorithm is further provided having by computing an initial probability of error for the classification system. Adjacent levels are merged for each feature in the quantized feature vectors. Level based probabilities of error are then calculated for these merged levels among the plurality of features. The system then selects and applies the merged adjacent levels having the minimum level based probability of error to create an intermediate classification system. Steps of merging, selecting and applying are performed until either the probability of error stops improving or the features and levels are incapable of further reduction.