A data set may be thought of as a collection of data elements or data points, each with a set of several features. A feature, in turn, may be thought of as a variable having a specific value for a specific data point. Thus, for example, a data set comprising car data may have a data point for each car model in current use, and each data point, corresponding to a particular car, may have features such as make, model, year, type, weight, fuel consumption, original price, color and engine capacity. For instance, a particular vehicle, the Runabout, manufactured by the CarCo company might have a data point with the following values for the listed feature set: (CarCo, Runabout, 1999, Minivan, 4900 lbs, 18 mpg, $23,000, navy blue, 3.0 liters).
A supervised data set is a data set in which each data point has been labeled with an identifying or a predictive label. Thus for the car data set, a label such as “family car” may be applied to cars with the feature type having a value Minivan, Sedan, or SUV; a label such as “fuel efficient” may be applied to cars with a fuel consumption value greater than 27 mpg. These labels are examples of identifying labels. Other labels may be numerical values, such as annual cost to insure, which may be turn out to depend primarily on, for example, on a car's type, weight, engine capacity, and year.
Machine learning is a field which, in a relevant aspect, includes trainable classifiers. A trainable classifier is in one instance a processor-based system that is trained using a supervised dataset; the same phrase may also be used to refer to a software program or programs that reside on storage in the processor based system or in a passive data storage medium and may be executed to operate the classifier. Once trained on a set of supervised data, a classifier may then predict a label for a new data point that is not in the dataset that was used for training. Thus, a classifier for car data that is trained on a data set with insurance costs as predictive labels may then be able to predict insurance costs for a car that is not in the training data set by being able to determine the closeness of the new data point corresponding to the new car to other data points on which the classifier has previously been trained. Thus, by determining a likely value for the label, the classifier is able to predict an insurance cost estimate for the car.
Trainable classifiers are known in the art and include tree-based and forest-based classifiers. A well known tree-based classifier is the Classification and Regression Tree (CART), from Breiman L., Classification and regression trees. (Wadsworth, 1984), based on a binary decision tree. A well known forest-based classifier is one based on a regression forest of trees, such as MART, described in Elements of Statistical Learning, by Hastie, Tibshirani and Friedman (Springer-Verlag, 2001). Many other classifiers are known in the art.
Dependency networks consist of a data set where the dependency of each feature on all the others is measured and sometimes visualized. Dependency networks are used as “collaborative filtering” networks and may be applied, as is known in the art, in such applications as predicting which book a customer may buy, given the books that the customer has purchased previously from a vendor and the dependencies those books had for previous customers. Manufacturers have used dependency networks to detect multiple joint causes of failure in factory systems. Dependency networks are mostly derived from statistical decision trees (Hastie et al., supra) where the first several decision split variables are considered the main dependencies. As is known in the art, split variables generally are variables which separate the data into consistent classes most efficiently. Decision split variables are not very stable in regards to adding or subtracting data points to the data set from which the dependency network is derived.