This invention relates generally to data mining.
Data mining involves the statistical analysis of complex data. In one application, data mining technology may be utilized to implement machine learning. Generally, data mining may be used to learn from data. Data features enable predictions to be made. A training set of data may be observed to find the combination and weighting of those features that are determinative of data outcomes of interest. A predictive model is developed to predict a corresponding outcome based on the previously found combination and weighting of features as they appear in new data.
A data set may include a collection of data points, each of which has a set of features. Supervised data contains labels or predictors. That is, a data set may contain a collection of features and a label or predictor for those features. As an example, a data set may include a collection of features about mushrooms, such as cap type, color, texture, and so on, and a label such as edible, poisonous, medicinal, and so on, or a predictor, such as a numeral value representing the toxicity of a mushroom. Unsupervised data lacks such a label or predictor. That is, an unsupervised data set may include a collection of features without a label or predictor.
A supervised classifier takes as an input the data point features and is trained on and learns to associate the label or predictor of that data point. In a test mode, where only the features of a data point are available, the classifier attempts to produce the correct label or predictor for the data point.
Tree based classifiers make sequential decisions on a selected feature at each branch point in order to arrive at a final label or prediction at the leaves of a tree. A classifier may be used to decide which data points meet a given criteria. At each branch point, data points are sorted into their appropriate branch according to how they meet the criterion. This classification proceeds downwardly from a root or starting point to leaves or ending points. A forest consists of many trees, each of which gives a weighted vote for the label or prediction value.
A kernel uses a radial kernel, such as a Gaussian kernel, to measure distances between data points and kernel centers. Kernel methods achieve localization using a weighting function of each kernel that assigns a weight to a data point based on its distance from each kernel center.
Nearest neighbor classifiers associate a label or predictor of a new point with that of its nearest neighboring points. Classification is based on the majority vote of those nearest neighbors.
In contrast to supervised classifiers, unsupervised classifiers are less well developed and require significant effort to obtain a desirable classification or meaningful data clusters. Examples of unsupervised classifiers include different clustering techniques, such as spectral clustering and agglomerative or hierarchical clustering. Spectral clustering takes an affinity matrix A of data points and performs singular value decomposition. The large singular values in the decomposition are calculated to indicate eigenvalues that correspond to clusters of data. Hierarchical filtering takes data points and builds a tree of affinity or proximity by linking the nearest points together on up through the tree until all data points are in a single cluster.
However, various problems exist with respect to these unsupervised classifiers. For example, it must be determined ahead of time how many clusters are desired. However, without a priori knowledge of the data set, it is difficult to accurately determine a number of clusters. Furthermore, depending upon the number of clusters present, different clusterings can occur, leading to less meaningful data clusters.
A need thus exists to more effectively analyze unsupervised data.