1. Technical Field
This application generally relates to techniques for information organization, and more particularly to techniques used in connection with clustering a set of objects so that similar objects are in a same group and dissimilar objects are in different groups.
2. Description of Related Art
Data may be stored in an electronic form for use with computerized techniques. A large amount of computerized data used in connection with a variety of different applications presents a challenge for how to locate and organize relevant information. Clustering refers to the process of classifying a set of data objects, such as documents included in the computerized data, into groups so that each group includes similar objects and objects belonging to other groups are dissimilar. In other words, objects of a first group are similar to one another but are also dissimilar with respect to other objects belonging to other groups. Existing techniques for clustering objects include a hierarchical clustering approach or a partitional approach. Hierarchical algorithms proceed successively by either merging smaller clusters into larger ones, or by splitting larger clusters. In contrast, partitional algorithms determine all clusters at once by decomposing the data set into a set of disjoint clusters. Hierarchical clustering algorithms can be further described as either a divisive method (i.e., top-down) or an agglomerative method (i.e., bottom-up). A divisive algorithm begins with the entire set and recursively partitions that data set it into two (or more) pieces, forming a tree. An agglomerative algorithm starts with each object in its own cluster and iteratively merges clusters.