1. Field of Invention
Embodiments disclosed herein relate generally to a method and system for transitioning from a case-based classifier system to a rule-based classifier system.
2. Discussion of the Related Art
Many classification algorithms operate based on the assumption that a particular record contains one or more record and that each record belongs to one or more distinct and exclusive/overlapping classes defined within a multi-dimensional feature space. The classes may be specified a priori by an analyst (as in supervised classification) or automatically discovered (as in unsupervised classification) into sets of classes using one or more classification models. Procedurally, there are two general types of classification algorithms/models: case-based and rule-based classification.
According to the case-based classification model (e.g., a nearest neighbor classification model), a new record is classified by computing either (a) the average distance between new record and each record within each class or (b) the distance between the new record and the centroid of each class. According to the case-based classification model, the “distance” between two records indicates how similar (or dissimilar) the two records are. The new record is classified into the class associated with either (a) the smallest computed average distance or (b) the smallest centroid distance. Unlike case-based classification models, rule-based classification models induce a generalized set of rules of classification from a plurality of records. The induced rules attempt to rationally explain the distribution of in the feature space and the relationships between the records and classes.
When the number of records populated within each class is small, it is difficult to induce a generalized rule of classification. In such circumstances, one of the most reasonable courses of action is to use the minimally populated set of classes as a case-based classification model. However, as the number of records populated within each class grows, algorithms adapted to implement case-based classification models slow to an unacceptable degree. Moreover, because case-based classification models essentially compare each new record with existing records, all records must be indefinitely stored within the main memory of a classifier system. Accordingly, as the number of records populated within each class grows, limits to the amount of storage space available within the main memory of the classifier system are quickly reached. Rule-based classification models, however, do not require that records be stored in main memory after one or more rules have been induced though, for reference purposes, records may be stored in a database.
In view of the above, it would be beneficial if there were a method and system of efficiently transitioning from a case-based classification model to a rule-based classification model.