Clustering is a popular technique in document processing. A clustering routine can be used to gather documents that have some common characteristic into a group or a cluster. By clustering documents with a common characteristic into a group, processing can be applied to a sample of documents in the group rather than to all documents in the group. Further, information determined from the documents in the group, or the sample of documents, can be applied to later documents that are clustered into the same group. Conventional algorithms and techniques for clustering documents, however, can take a long time (e.g., O(n^2)) to create or populate the clusters. This can make clustering cumbersome for very large sets of documents.