This invention relates generally to data mining.
Data mining involves the statistical analysis of complex data. In one application, data mining technology may be utilized to implement machine learning. Generally, data mining may be used to learn from data. Data features enable predictions to be made. A training set of data may be observed to find the combination and weighting of those features that are determinative of data outcomes of interest. A predictive model is developed to predict a corresponding outcome based on the previously found combination and weighting of features as they appear in new data.
A data set may include a collection of data points, each of which has a set of features. Supervised data contains labels or predictors. That is, a data set may contain a collection of features and a label or predictor for those features. As an example, a data set may include a collection of features about mushrooms, such as cap type, color, texture, and so on, and a label such as edible, poisonous, medicinal, and so on, or a predictor, such as a numeral value representing the toxicity of a mushroom.
Binary classifiers are among the most mature pattern recognition tools. Binary classifiers are trained on M feature vectors F: f1, f2, . . . , fN, each of which has one of two possible class labels, C: 0 or 1. Once trained, these classifiers learn a mapping from F to C. In a performance or test mode, a usually new feature point with no label is presented to the classifier which then maps it to either class 0 or class 1. Such binary classifiers include tree-based classifiers.
Tree-based classifiers make sequential decisions on a selected feature at each branch point in order to arrive at a final label or prediction at the leaves of a tree. A classifier may be used to decide which data points meet a given criteria. At each branch point, data points are sorted into their appropriate branch according to how they meet the criterion. This classification proceeds downwardly from a root or starting point to leaves or ending points. A forest consists of many trees, each of which gives a weighted vote for the label or prediction value.
A kernel uses a radial kernel, such as a Gaussian kernel, to measure distances between data points and kernel centers. Kernel methods achieve localization using a weighting function of each kernel that assigns a weight to a data point based on its distance from each kernel center. Nearest neighbor classifiers associate a label or predictor of a new point with that of its nearest neighboring points. Classification is based on the majority vote of those nearest neighbors.
Another type of binary classifier is a stochastic discrimination binary classifier, in which the law of large numbers is leveraged, resulting in a classifier that does not overtrain and obtains quality recognition scores. However, stochastic discrimination only operates on binary variables.
Accordingly, a need exists to better classify analog or continuous data, such as an analog function, using these more mature classification techniques.