This invention relates to the field of data processing. It is more specifically directed to the field of computer data mining. More particularly, the invention relates to methods and apparatus for generating a decision tree classifier with oblique hyperplanes from data records.
Data mining is the search for valuable information from data. Classification is a form of data mining in which relationships are learned between a set of attributes and a set of predetermined classes. This relationship is represented in a classifier. Various phenomena can be represented by such relationships. Examples of such phenomena can be found in the financial domain, insurance domain and in the medical domain. The dependence of an individual""s credit worthiness on various characteristics like salary, years in the job, amount of debt, value of assets and so on is an example of a phenomena. The characteristics like salary, years in the job are attributes. Possible class labels include xe2x80x9ccredit worthyxe2x80x9d and xe2x80x9ccredit riskxe2x80x9d. In the medical domain, dependence of the outcome on various tests, treatments and patient characteristics is another example of a phenomena. The process of generating a classifier uses input data, herein referred to as a training set, which includes multiple records. Each record has values for various attributes, and has a unique and discrete valued class label. The number of attributes are referred to as the dimensionality of the attribute space. Generally each attribute is also referred to as a dimension. Attributes can be categorical or numeric in nature. This invention relates to numeric attributes. Classification has wide applications in various domains.
Classification has been studied extensively within several disciplines, including statistics, pattern recognition, machine learning, neural networks and expert systems. Known classification techniques include statistical algorithms, decision trees, rule induction, neural networks, and genetic algorithms. The desired qualities for classification include prediction accuracy, speed of classification and understandability, and intuitiveness of the classification result.
The decision tree based method is chosen as an example basis for this invention because of its superior speed of classification and scalability to high dimensional problems with large training sets. Decision tree classifiers can be separated into two forms depending on the nature of the test at each node of the tree. The simplest form of decision trees has a test of the form (xixe2x89xa6b), where xi is the value in the i-th numeric dimension and b is some constant. A more complex form of decision tree allows linear combinations of the attributes in the test at each node. In this case, the test is of the form
(a1.x1+a2.x2+ . . . +an.xnxe2x89xa6b).
These trees, also called oblique trees or trees using oblique hyperplanes, produce better results for some problem domains. This was discussed and demonstrated in xe2x80x9cClassification and Regression Trees,xe2x80x9d Breiman et. al., Chapman and Hall/ CRC, 1984, which is hereinafter referred to as xe2x80x9cCARTxe2x80x9d. In such domains oblique trees produce compact solutions with higher accuracy. While these are advantageous, the generation of these oblique trees is difficult because of the difficulty in determining the equation for the complex test at each node.
Some oblique tree generation methods use a particular form of an optimization technique to determine the test at each node. These methods are complex and tend to be computationally intensive without any guarantee of improved accuracy. Another method uses the statistical technique of linear discriminants in the construction of oblique decision trees. This technique often reduces the time taken to generate the oblique trees. However, the resulting trees are usually quite complex and there is still room for improvement in the classification accuracy.
It is therefore an aspect of the present invention to present a method and apparatus for generating a decision tree classifier with oblique hyperplanes from data records. In an embodiment the classifier is generated using an iterative method wherein for each iteration a set of vectors is provided to a decision tree generating process. The decision tree generated uses hyperplanes orthogonal to the vectors provided in the set to separate records at each of its node. The iterative process starts out with the set of numeric attribute axes as the set of vectors. At the end of each iteration pairs of leaf nodes in the generated tree are considered and analyzed to determine new vectors. The set of vectors for the next iteration is determined using a filter process. This iterative process generates multiple decision trees from which one tree is chosen as a solution meeting a particular criteria.