Different techniques have been used in the past to extract useful information from a data set. In a data set or information system that can be represented as a table, with rows representing objects or signals associated with a specific class, and columns representing attributes of the objects, a number of methods have been used to identify or classify those objects in the past.
For example, High Range Resolution (HRR) radar imaging data, that can be used for an Automatic Target Recognition (ATR) system for military aircraft, can be represented as a table of data having rows representing signals with columns representing range bins (this example will be further discussed and described below). In the past, one of the most frequently chosen techniques to classify these HRR signatures (or identify the target aircraft represented by these HRR signals) has been to use a constrained quadratic classifier. This classifier is based on computing the mean and variance estimation for each range bin, or column entry, in the signal. A variant of this technique is to use the mean square error instead of the variance term.
This approach works best when there is a small class of targets to be identified, or classified—such as five or ten targets. In addition, this approach does very poorly at rejecting or not declaring on unknown targets. Further, it is not robust due to the fact that it tries to match range bins (column entries) in the signal which contain little or no information about the target. Typically, these range bins are at the beginning or at the end of the signal.
It has become apparent that there was room for significant improvement in the area of statistical pattern recognition. Applying emerging machine intelligence and data mining techniques to overcome the errors with estimations and assumptions in current statistical classifiers is highly desirable.
Rough set theory is an approach to data mining that has been around since the early 1980's. It was believed this theory had the potential to produce a more robust classifier. Rough Set Theory assumes that the training data set is all that is known and all that needs to be known to do the classification problem. Techniques to find the minimal set of attributes (columns, or range bins for the HRR problem example) to do the classification are available in the theory. Further, the theory should be robust since it will find all the classifiers.
A workable, robust classifier using machine learning and data mining techniques is needed. Specifically, the approach should determine which features, or attributes (columns) are important; generate a multiplicity of classifiers; be robust; and be computationally appropriate for real world problem solving.
Once the data is labeled, Rough Set Theory guarantees that all possible classifiers using that training data set will be found. There is no equivalent statement using statistical pattern recognition techniques that can be made. However, in the known Rough Set Theory method, generating all the classifiers is an NP-hard (non-polynomial time complexity) problem. In summary, all known methods are either subject to error, are computationally inefficient and therefore inappropriate for large problem sets, or both.
The present invention overcomes the above-described problems and others. It provides a computationally efficient, robust classification system and method that can be used on a wide variety of pattern recognition problems and for other types of data mining tasks.