Prediction trees can include a type of decision tree used in machine learning and data mining applications, among others. A prediction tree can be a decision tree in which each node has a real value associated with it, in addition to a branching variable as in a conventional decision tree. Prediction trees may be built or learned by using a first set of training data, which is then used to construct the decision and prediction values. A tree may be then applied against a second set of validation data, and the results are used to fine-tune the tree. Various computer-implemented techniques are known for growing and applying prediction trees to arbitrary data sets.
Conventional techniques for building prediction trees include two phases: a growing phase and a pruning phase. In the growing phase, nodes are added to the tree to match a known set of data, such as a training set. During this phase the tree may be overgrown, often to the point of fitting some noise in the data as well as real trends and patterns in the data. In an extreme case, for example, a tree can be constructed for a set of data in which each data point is associated with an individual leaf, i.e., the tree is fit exactly to the data set so that no two examples or data points result in the same end leaf or path through the tree. In some cases, such an overgrown tree may exactly fit known data, but could be ineffective or useless at predicting outcomes for other examples or data points.
To avoid the problem of overgrowing a tree, a second pruning phase may be employed in which sections of the tree that provide little or no additional predictive power are removed or collapsed. For example, a portion of the tree that fails to distinguish further among most of the examples that lead to that portion of the tree may be removed, thus terminating that portion of the tree at a higher node. Various pruning and validation techniques are known. For example, validation data may be applied to the tree to determine whether the tree provides equivalent or better predictions in the absence of certain nodes. Such nodes may then be pruned from the tree. Generally, the two-step growing and pruning process is computationally expensive.
Various other additions to tree learning are known. Some tree learning and application techniques associate a prediction with internal nodes of prediction trees; such techniques have been used for the estimation and learning of context trees for compression and classification. Measure-based regularization of prediction trees has been used to penalize a Hilbert norm of the gradient of a prediction function ƒ. Some tree growing techniques have made use of self-controlled learning for online learning of self-bounded suffix trees. The learning procedure can be viewed as the task of estimating the parameters of a prediction tree of a fixed structure using the hinge loss for assessing the empirical risk along with an l2-norm variation penalty. In the context of online learning, this setting may lead to distilled analysis that implies sub-linear growth of the suffix tree. However, such approaches may not migrate directly to other settings. Various Bayesian approaches have also been used for tree induction and pruning.