Decision trees are widely used as classification tools. One major advantage of decision tree are their interpretability that is, the decision can be interpreted in terms of a rule set. Interpretability, in this context, means that at every node of a decision tree, the branching decision is based upon the value of a single attribute, and the choice of the attribute is based upon a splitting criterion. The net result is that each leaf of the decision tree represents a cluster, and the path from the root to the leaf defines a rule that describes the cluster.
Hierarchical clustering involves first dividing a data set (consisting of a set of patterns) into a certain number of clusters at a relatively coarse level, then further segmenting each of these coarse clusters into relatively finer levels until a “stop” criterion is satisfied.
A similar clustering technique can also be conversely performed in a “bottom-up” manner. A large number of clusters at a fine level of resolution are clustered into broader categories at each successive level. In either case, each level represents a degree of resolution or coarseness.
Various existing clustering techniques are used to manage information. Bellot et al (in Patrice Bellot and Marc El-Beze, Clustering by means of unsupervised decision trees or hierarchical and K-means like algorithms, RIAO 2000 Conference Proceedings, Paris, France, Apr. 12-14, 2000, pp. 344 to 363) describe a decision tree provided for text categorization. Information about text clusters is used in conjunction with supervised information about whether a document is useful or not useful to a user. The total information content in the cluster of useful documents and in the cluster of non-useful documents is used to build a decision tree.
Held et al [Marcus Held and J. M. Buhmann, Unsupervised on-line learning of decision trees for hierarchical data analysis, Proc. Advances of the Neural Information Processing Systems (NEPS97), 1997] describe a decision tree or a hierarchy representing the clusters is provided based on minimization of a criterion function that is generally used for clustering using EM (expectation-maximization) and soft k-means (that is, fuzzy k-means) algorithms. The data set is divided into two clusters at each level in such a way that the division minimizes the criterion function. This technique is essentially a hierarchical form of an EM-based clustering algorithm. Thus, this technique provides a hierarchical clustering algorithm in which the first level clusters (two clusters) are formed at a relatively coarse resolution. Relatively finer resolution clusters are formed down the hierarchy.
Liu et al [Bing Liu, Yiyuan Xia, and Phillip S. Yu, Clustering through decision tree construction, IBM Research Report, RC 21695, 2000] describe injecting noisy data values into a data set. A decision tree is the provided by classifying the original data values and the noisy data values, by assuming that the original data values and the noisy data values belong to two respectively different classes. Although the objective is to build an unsupervised decision tree from the unlabelled data, the method for building a supervised decision tree has been applied here and the performance of this technique depends upon the amount of noisy data injected into the original data set.
In the above-described techniques, a binary decision tree is formed, rather than a generalized n-ary decision tree. In this case, n is the number of child nodes created at a node. Thus, n is a variable that depends on the type of the data at each node of every level of the decision tree.
Existing techniques provide hierarchical clusters in which each cluster level does not have any direct interpretability. In other words, in order to interpret a generated hierarchy, the clusters at each node need to be separately analyzed. Also, most of the existing techniques create a binary hierarchy rather than a generic n-ary decision tree. Accordingly, a need clearly exists for an improved manner of performing hierarchical clustering.