1. Field of Invention
This invention relates to methods and systems for tree-based classifiers.
2. Description of Related Art
Early tree-based classifiers and learning techniques based on these classifiers such as CART (Breiman, Olshen and Stone) and C4.5 (Quinlan) had three problems. The first problem was that the rules produced by these classifiers were of limited accuracy. The second problem was that the rules produced by these classifiers were usually complex and thus, difficult to understand. The third problem was that users had no indication as to the reliability of the results of a particular query.
To overcome these problems, the technique of xe2x80x9cboostingxe2x80x9d was developed. An example of one popular boosting technique is Adaboost which is described in U.S. Pat. No. 5,819,247 incorporated herein by reference in its entirety. Boosting is a technique that develops highly accurate hypothesis (classification rules) by combining many weak hypotheses, each of which is only moderately accurate. The idea behind boosting such tree-based classifiers is to successively add simple rules to form more complex decision making criteria until the tree-based classifier exhibits satisfactory performance.
Although tree-based classifiers produced by boosting techniques achieve significant improvement in accuracy and provide a measure of confidence for each prediction, the resulting classifiers are still complex and the classification rules produced are still difficult to understand. Thus, methods and systems that produce highly accurate classifiers while generating rules that are small in size and easy to interpret are desirable.
The present invention provides methods and systems for creating and modifying general alternating decision tree classifiers. The general alternating decision tree classifiers include a root prediction node attached directly or indirectly to a set of rules, each rule having a decision node and two prediction nodes. The decision nodes contain conditional tests that can act on a data instance. The prediction nodes, including the root prediction node, contain numbers related to the probable classifications of data instances. Together, the decision nodes and prediction nodes form a construct of alternating decision nodes and prediction nodes that can classify data instances into subsets.
General alternating decision trees are created by first determining a root prediction node. The general alternating decision trees are then grown by iteratively adding rules consisting of a decision node and two prediction nodes. At each stage, the decision node of a new rule can be added as a child of any prediction node of a preexisting rule. Any number of rules may directly attach to any prediction node on the general alternating decision tree including the root prediction node. The new rules are added according to the underlying boosting technique. Accordingly, a xe2x80x9cgeneral alternating decision treexe2x80x9d of rules is formed.
The present invention can also provide methods and apparatus for classifying data instances using general alternating decision tree classifiers. Given a data instance, the general alternating decision tree maps the data instance by exploring the tree from the root prediction node to the end prediction nodes, or leaves. At each decision node in the tree, only the prediction node child that corresponds to the outcome of the conditional test based on the data instance is explored. At each prediction node, all the decision node children branching from the prediction node are explored. The confidence values contained in all the prediction nodes explored are then added. The sign of the sum is the classification of the data instance and the magnitude of the sum corresponds to a measure of confidence of the classification.