Digital computers have been employed for a number of years in the maintenance and processing of large databases and datasets. In recent years, computers have been used to facilitate accurate assignment of records in a database to a set of predetermined classes, generally based on a training database of properly classified records. For example, U.S. Pat. No. 5,251,131, issued Oct. 5, 1993, in the names of Brij M. Masand and Stephen J. Smith, and entitled Classification of Data Records By Comparison Of Records To A Training Database Using Probability Weights, assigned to the assignee of the present application (incorporated herein by reference), describes a system for classifying natural language data in the form of records, after "training" using a training database comprising properly-classified records. In the arrangement described in that patent, probability weights are used to express the likelihood that a record containing particular terms in particular fields are properly assigned to particular ones of a selected set of classes.
A number of other methodologies have also been developed for classification. One methodology, generally referred to as CART ("classification and regression trees") makes use of trees to perform classification as well as regression. In the CART methodology, a tree is developed including a plurality of nodes extending from a root to a plurality of leaves. Each node, above the leaf level, represents a query, and each leaf node is associated with one of a plurality of classes. The tree facilitates the classification of individual records in, for example, a database, to selected classes based on data contained in the respective records. For each record, the query represented by the root node is applied, and the response to that inquiry provides a basis for selecting one of the child nodes so that the query represented by that node can be applied. This process is repeated through a series of queries until a response to a particular query directs the record to a leaf node, which identifies the class to which the record is to be assigned.
In developing a classification or regression tree, training database is used including a number of properly-classified records. In processing the database to identify the appropriate query to be used for each node, a substantial amount of processing is generally required.