The invention described herein may be manufactured and used by or for the Government of the United States of America for governmental purposes without the payment of any royalties thereon or therefor.
(1) Field of the Invention
The invention relates to a data reduction system that reduces the dimensionality of neural network training data by finding features that most improve performance of the neural network.
(2) Description of the Prior Art
The use of classification systems to classify input data into one of several predetermined classes is well known. Their use has been adapted to a wide range applications including target identification, medical diagnosis, speech recognition, digital communications and quality control systems.
Classification of sonar signals into threats and non-threats is an important task for sonar operators. Neural networks have been proposed to help accomplish this task by receiving a signal from the sonar system and analyzing characteristics of the signal for determining if the signal is originating from a vessel that is a military vessel that represents a threat or from a commercial vessel. Speed in making this determination is often of the essence.
Classification systems decide, given an input X, to which of several output classes X belongs. If known, measurable characteristics separate classes, the classification decision is straightforward. However, for most applications, such characteristics are unknown, and the classification system must decide which output class the input most closely resembles. In such applications, the output classes and their characteristics are modeled (estimated) using statistics for the classes derived from training data belonging to known classes. Thus, the standard classification approach is to first estimate the statistics from the given training data and then to apply a decision rule using these estimated statistics.
However, often there is insufficient training data to accurately infer the true statistics for the output classes which results in reduced classification performance or more occurrences of classification errors. Additionally, any new information that arrives with the input data is not combined with the training data to improve the estimates of the symbol probabilities. Furthermore, changes in symbol probabilities resulting from changes, which may be unobservable, in the source of test data, the sensors gathering data or the environment often result in reduced classification performance. Therefore, if based on the training data, a classification system maintains a near zero probability for the occurrence of a symbol and the symbol begins to occur in the input data with increasing frequency, classification errors are likely to occur if the new data is not used in determining symbol probabilities.
Attempts to improve the classification performance and take advantage of information available in test data have explored combining the test data with the training data in modeling class statistics and making classification decisions. While these attempts have indicated that improved classification performance is possible, they have one or more drawbacks which limit or prevent their use for many classification systems.
The use of Bayseian classification is taught in the prior art for combining training data with test data is found in Merhav et al, xe2x80x9cA Bayesian Classification Approach with Application to Speech Recognition,xe2x80x9d IEEE Trans. Signal Processing, vol. 39, no. 10 (1991) pp. 2157-2166. In Merhav et al classification decision rules which depend on the available training and test data were explored. A first decision rule which is a Bayesian rule was identified. However, this classification rule was not fully developed or evaluated because the implementation and evaluation of the probability density functions required are extremely complex.
It is known in prior art artificial intelligence systems to reduce data complexity by grouping data into worlds with shared similar attributes. This grouping of the data helps separate relevant data from redundant data using a co-exclusion technique. These methods search saved data for events that do not happen at the same time. This results in a memory saving for the systems because only the occurrence of the event must be recorded. The co-exclusive event can be assumed.
Bayesian networks, also known as belief networks are known in the art for use as filtering systems. The belief network is initially learned by the system from data provided by an expert, user data and user preference data. The belief network is relearned when additional attributes are identified having an effect. The belief network can then be accessed to predict the effect.
A method for reducing redundant features from training data is needed for reducing the training times required for a neural network and providing a system that does not require long training times or a randomized starting configuration.
Accordingly, it is a general purpose and primary object of the present invention to provide a classification system capable of classifying data into multiple classes.
Another object of the invention is that such classification system should not include redundant and ineffectual data.
A further object of the invention is to provide a method for reducing feature vectors to only those values which affect the outcome of the classification.
Accordingly, this invention provides a data reduction method for a classification system using quantized feature vectors for each class with a plurality of features and levels. The reduction algorithm consisting of applying a Bayesian data reduction algorithm to the classification system for developing reduced feature vectors. Test data is then quantified into the reduced feature vectors. The reduced classification system is then tested using the quantized test data.
A Bayesian data reduction algorithm is further provided having by computing an initial probability of error for the classification system. Adjacent levels are merged for each feature in the quantized feature vectors. Level based probabilities of error are then calculated for these merged levels among the plurality of features. The system then selects and applies the merged adjacent levels having the minimum level based probability of error to create an intermediate classification system. Steps of merging, selecting and applying are performed until either the probability of error stops improving or the features and levels are incapable of further reduction.