Data may be stored in an electronic form for use with computerized techniques. A large amount of computerized data used in connection with a variety of different applications presents a challenge for how to locate and organize relevant information. Clustering refers to the process of classifying a set of data objects, such as documents included in the computerized data, into groups so that each group includes similar objects and objects belonging to other groups are dissimilar.
Clustering provides a means of grouping fresh documents together when there is a large volume of information or news concerning a specific topic. A summary or abstract of the cluster is displayed, along with links to documents within the cluster and other pertinent information, in a search engine results page (SERP). A fresh document is a document concerning a recent topic or subject of interest. After a short period of time, a document is no longer considered a fresh document. Documents are clustered when they are fresh, and they are provided with identification numbers. This information remains with the document to help discern this cluster from a new cluster on a similar topic.
Techniques for clustering objects include, but are not limited to a hierarchical clustering approach or a partitional approach. Hierarchical algorithms proceed successively by either merging smaller clusters into larger ones, or by splitting larger clusters into smaller clusters. In contrast, partitional algorithms determine all clusters at once by decomposing the data set into a set of disjoint clusters. Hierarchical clustering algorithms can be further described as either a divisive method (i.e., top-down) or an agglomerative method (i.e., bottom-up). A divisive algorithm begins with the entire set and recursively partitions that data set into two (or more) pieces, forming a tree. An agglomerative algorithm starts with each object in its own cluster and iteratively merges clusters.