1. Field of the Invention
The present invention relates, generally, to spatial prediction and, more particularly, to a leaf node ranking method in decision trees for spatial prediction, and its recording medium, the decision trees being one of data mining classification methods.
2. Description of the Related Art
Decision trees are used for cause analysis of prediction results in various prediction fields because training results may be intuitively converted to decision rules. Also, the decision trees are successfully applied to prediction fields thanks to guaranteeing accuracy and speed.
General tree-based algorithms are comprised of a tree building phase and a tree pruning phase. A difference between tree structures that are constructed from a tree algorithm mostly comes from attribute selection criteria and tree pruning criteria.
After a decision tree is constructed through training data, a decision rule may be converted by differently selecting a path from a root node to a leaf node. Branching from the root node, the training data is finally distributed on the leaf nodes, and a rank (or a priority) of the rule induced from the tree is calculated using class distribution included in the leaf nodes. This rank is calculated by a proportion of multiple classes to total classes, and a small number of classes are considered as misclassifications. On the other hand, in a spatial prediction application, the rank is calculated by a proportion of target classes, which are prediction targets, to the total classes.
As decision trees aim to make “pure” leaves by node splitting criteria, a component ratio of event classes of a leaf node is eventually converged into 0 or 1. Therefore, it is not easy to assign a rank to the leaf node. Existing leaf node ranking methods are invented to solve the above problem and have been used for prediction applications for non-spatial data.
Prediction accuracy may vary according to leaf node ranking methods. Laplace estimate involves applying Laplace correction when calculating a frequency of event classes for evaluating the rank of a leaf node. In other words, Laplace estimate is used to improve existing probability estimates and may be modified as equation 1 for a spatial prediction application.
                              R          ⁡                      (            node            )                          =                                            n              event                        +            1                                              n              event                        +                          n              non_event                        +            c                                              [                  equation          ⁢                                          ⁢          1                ]            
Here, c means the number of classes in total data sets.
M-estimate, another method for leaf node ranking, uses a prior probability to an event class. In a spatial prediction application, assuming that b and m are constant parameters and b is the prior probability to an event occurrence, the equation is defined as follows.
                              R          ⁡                      (            node            )                          =                                            n              event                        +            bm                                              n              event                        +                          n              non_event                        +            m                                              [                  equation          ⁢                                          ⁢          2                ]            
M-branch method is a variation of M-estimate, and is defined as equation 3 for spatial event prediction. Here, m is calculated using the depth of a node and the number of samples that are included in a class without a label.
                              R          ⁡                      (            node            )                          =                                            n              event                        +                          mR              ⁡                              (                                  node                  .                  parent                                )                                                                        n              event                        +                          n              non_event                        +            m                                              [                  equation          ⁢                                          ⁢          3                ]            
Here, the parameter, m, is calculated by the equation,M+(d−1)/d×M√{square root over (N)}. 
As previous leaf node ranking methods (Laplace estimate, m-branch, and M-estimate) are proposed for spatial multi-class classification, existing equations are modified to above defined equations 1 to 3 for a spatial prediction application. In other words, the equations may represent a relative probability of occurrence of an event by modifying the existing equations to reflect a proportion of event-occurring classes to total classes.