This invention relates generally to data mining.
Data mining involves the statistical analysis of complex data. In one application, data mining technology may be utilized to implement machine learning. Generally, data mining may be used to learn from data. Data features enable predictions to be made. A training set of data may be observed to find the combination and weighting of those features that are determinative of data outcomes of interest. A predictive model is developed to predict a corresponding outcome based on the previously found combination and weighting of features as they appear in new data.
A dataset may include a collection of data points which have a set of features. Supervised data contains labels or predictors. That is, a dataset may contain a collection of features and a label or predictor for those features. As an example, a dataset may include a collection of features about mushrooms, such as cap type, color, texture, and so on, and a label such as edible, poisonous, medicinal, and so on, or a predictor, such as a numeral value representing the toxicity of a mushroom.
A supervised classifier takes as an input the data point features and is trained on and learns to associate the label or predictor of that data point. In a test mode, where only the features of a data point are available, the classifier attempts to produce the correct label or predictor for a data point.
Tree based classifiers make sequential decisions on a selected feature at each branch point in order to arrive at a final label or prediction at the leaves of a tree. A classifier may be used to decide which data points meet a given criteria. At each branch point, data points are sorted into their appropriate branch according to how they meet the criterion. This classification proceeds downwardly from a root or starting point to leaves or ending points. A forest consists of many trees, each of which give a weighted vote for the label or prediction value.
A kernel uses a radial kernel, such as a Gaussian kernel, to measure distances between data points and kernel centers. Kernel methods achieve localization using a weighting function of each kernel that assigns a weight to a data point based on its distance from each kernel center.
Nearest neighbor classifiers associate a label or predictor of a new point with that of its nearest neighboring points. Classification is based on the majority vote of those nearest neighbors.
It would be desirable to quantitatively assess the effectiveness of various supervised classifiers on any given dataset.