Clustering is a powerful and widely used tool for discovering structure in data. Classical algorithms [9] are static, centralized, and batch. They are static because they assume that the data being clustered and the similarity function that guides the clustering do not change while clustering is taking place. They are centralized because they rely on common data structures (such as similarity matrices) that must be accessed, and sometimes modified, at each step of the operation. They are batch because the algorithm runs its course and then stops.
Some modern information retrieval applications require ongoing processing of a massive stream of data. This class of application imposes several requirements on the process that classical clustering algorithms do not satisfy.                Dynamic Data and Similarity Function.—Because the stream continues over a length period of time, both the data being clustered and users' requirements and interests may change. As new data enters, it should be able to find its way in the clustering structure without the need to restart the process. If the intrinsic structure of new data eventually invalidates the organization of older data, that data should be able to move to a more appropriate cluster. A model of the user's interest should drive the similarity function applied to the data, motivating the need to support a dynamic similarity function that can take advantage of structure compatible with both old and new interests while adapting the structure as needed to take account of changes in the model.        Decentralized.—Because of the massive nature of the data, the centralized constraint is a hindrance. Distributed implementations of centralized systems are possible, but the degree of parallel execution is severely limited by the need to maintain the central data structure as the clustering progresses. One would like to use parallel computer hardware to scale the system to the required level, with nearly linear speed-up.        Any-time.—Because the stream is continual, the batch orientation of conventional algorithms, and their need for a static set of data, is inappropriate. The clustering process needs to run constantly, providing a useful (though necessarily approximate) structuring of the data whenever it is queried.        
Biological ants cluster the contents of their nests using an algorithm that is dynamic, decentralized, and anytime. Each ant picks up items that are dissimilar to those it has encountered recently, and drops them when it finds itself among similar items. This approach is dynamic, because it easily accommodates a continual influx of new items to be clustered without the need to restart. It is decentralized because each ant functions independently of the others. It is anytime because at any point, one can retrieve clusters from the system. The size and quality of the clusters increase as the system runs.
Previous researchers have adapted this algorithm to practical applications, but (like the ant exemplar) these algorithms produce only a partitioning of the objects being clustered. Particularly when dealing with massive data, a hierarchical clustering structure is far preferable. It enables searching the overall structure in time logarithmic in the number of items, and also permits efficient pruning of large regions of the structure if these are subsequently identified as expendable.
Natural Ant Clustering
An ant hill houses different kinds of things, including larvae, eggs, cocoons, and food. The ant colony keeps these entities sorted by kind. For example, when an egg hatches, the larva does not stay with other eggs, but is moved to the area for larvae. Computer scientists have developed a number of algorithms for sorting things, but no ant in the ant hill is executing a sorting algorithm.
Biologists have developed an algorithm that is compatible with the capabilities of an ant and that yields collective behavior comparable to what is observed in nature [2, 4]. Each ant executes the following steps continuously:                1. Wander randomly around the nest.        2. Sense nearby objects, and maintain a short memory (about ten steps) of what has been seen.        3. If an ant is not carrying anything when it encounters an object, decide stochastically whether or not to pick up the object. The probability of picking up an object decreases if the ant has recently encountered similar objects. In the emulation, the probability of picking up an object isp(pickup)=(k+/(k++f))2         
where f is the fraction of positions in short-term memory occupied by objects of the same type as the object sensed and k+ is a constant. As f becomes small compared with k+, the probability that the ant will pick up the object approaches certainty.                4. If an ant is carrying something, at each time step decide stochastically whether or not to drop it, where the probability of dropping a carried object increases if the ant has recently encountered similar items in the environment. In the emulation,p(putdown)=(f/(k−+f))2         
where f is the fraction of positions in short-term memory occupied by objects of the same type as the object carried, and k− is another constant. As f becomes large compared with k−, the probability that the carried object will be put down approaches certainty.
The Brownian walk of individual ants guarantees that wandering ants will eventually examine all objects in the nest. Even a random scattering of different items in the nest will yield local concentrations of similar items that stimulate ants to drop other similar items. As concentrations grow, they tend to retain current members and attract new ones. The stochastic nature of the pick-up and drop behaviors enables multiple concentrations to merge, since ants occasionally pick up items from one existing concentration and transport them to another
The put-down constant k− must be stronger than the pick-up constant k+, or else clusters will dissolve faster than they form. Typically, k+ is about 1 and k− is about 3. The length of short-term memory and the length of the ant's step in each time period determine the radius within which the ant compares objects. If the memory is too long, the ant sees the entire nest as a single location, and sorting will not take place.
Previous Engineered Versions
Several researchers have developed versions of the biological algorithm for various applications. These implementations fall into two broad categories: those in which the digital ants are distinct from the objects being clustered, and those that eliminate this distinction. All of these examples form a partition of the objects, without any hierarchical structure. In addition, we summarize in this section previous non-ant approaches to the problem of distributing clustering computations.
Distinct Ants and Objects
A number of researchers have emulated the distinction in the natural ant nest between the objects being clustered and the “ants” that carry them around. All of these examples cluster objects in two-dimensional space.
Lumer and Faieta [12] present what is apparently the earliest example of such an algorithm. The objects being clustered are records in a database. Instead of a short-term memory, their algorithm uses a measure of the similarity among the objects being clustered to guide the pick-up and drop-of actions.
Kuntz et al. [11] apply the Lumer-Faieta algorithm to partitioning a graph. The objects being sorted are the nodes of the graph, and the similarity among them is based on their connectivity. Thus the partitioning reflects reasonable component placement for VLSI design.
Hoe et al. [8] refine Lumer and Faieta's work on data clustering by moving empty ants directly to available data items. Handl et al. [6] offer a comparison of this algorithm with conventional clustering algorithms.
Handl and Meyer [7] cluster documents returned by a search engine such as Google, to generate a topic map. Documents are characterized by a keyword vector of length n, thus situating them in an n-dimensional space. This space is then reduced using latent semantic indexing, and then ant clustering projects them into two dimensions for display. This multi-stage process requires a static document collection.
These efforts use only document similarity to guide clustering. Ramos [20] adds a pheromone mechanism. Ants deposit digital pheromones as they move about, thus attracting other ants and speeding up convergence.
Walsham [23] presents a useful summary of the Lumer-Faieta and Handl-Meyer efforts and studies the performance of these algorithms across their parameter space.
Oprisen [17] applies the Deneubourg model to foraging robots, and explores convergence speed as a function of the size of the memory vector that stores the category of recently encountered objects.
Monmarché [14] clusters data objects on a two-dimensional grid, basing drop-off probabilities on fixed similarity thresholds. Inter-object distance is the Euclidean distance between the fixed-length vectors characterizing the objects. To speed convergence, once initial clusters have formed, K-means is applied to merge stray objects. Then the sequence of ant clustering and K-means is applied again, this time to whole clusters, to merge them at the next level. The distinction between the smaller clusters is not maintained when they are merged, so that the potential for generating a true hierarchy is not realized. Kanade and Hall [10] use a similar hybrid process, employing fuzzy C-means instead of K-means as the refinement process. The staged processing in these models has the undesirable consequence of removing them from the class of any-time algorithms and requiring that they be applied to a fixed collection of data.
Schockaert et al. [22] also merge smaller clusters into larger ones, but using a real-time decision rule that tells an ant whether to pick up a single object or an entire cluster. Thus their algorithm, unlike Monmarché's, can accommodate a dynamic document population.
Active Objects
A natural refinement of these algorithms eliminates the distinction between ant and object. Each object is active, and can move itself.
Beal [1] addresses the problem of discovering a hierarchical organization among processors in an amorphous computing system. The nodes themselves are fixed, but efficient processing requires grouping them into a hierarchy and maintaining this hierarchy if the medium is divided or merged or if some processors are damaged. Processors form groups based on their distance from each other: they find neighbors and elect leaders. These leaders then repeat the process at the next level. The similarity function is implicit in the RF communication connectivity and conforms to a low-dimensional manifold, very different from the topology induced by document similarity.
Chen et al [3] apply the active object model to clustering data elements on a two-dimensional grid. They invoke the dissimilarity of data objects as the distance measure that drives clustering, but do not develop this dissimilarity in detail. They are particularly concerned to manage the processor cycles consumed by the document agents (a concern that is managed in the systems of the previous section by limiting the number of ants). Their solution is to have documents fall asleep when they are comfortable with their surroundings, awakening periodically to see if the world has changed.
We have implemented [19] a flat clustering mechanism with active objects. The other algorithms discussed so far form clusters on a two-dimensional manifold, but our algorithm clusters them on a graph topology reflecting the interconnections between processors in a computer network. The nodes of the graph are places that can hold a number of documents, and documents move from one place to another. Each document is characterized by a concept vector. Each element of the concept vector corresponds to a subsumption subtree in WordNet [5, 13], and has value 1 if the document contains a lexeme in the WordNet subtree, and 0 otherwise. Similarity between documents is the cosine distance between their concept vectors. Each time a document is activated, it compares itself with a sample of documents at its current node and a sample of documents at a sample of neighboring nodes, and probabilistically decides whether to move. This algorithm converges exponentially fast (FIG. 1) [19], even when documents are added while the process runs (FIG. 2). To manage the computational cost of the active objects, each one uses pheromone learning [18] to modulate its computational activity based on whether recent activations have resulted in a move or not.
Non-Ant Distributed Clustering
There has been some work on distributed clustering not using the ant paradigm.
Olson [16] summarizes a wide range of distributed algorithms for hierarchical clustering. These algorithms distribute the work of computing inter-object similarities, but share the resulting similarity table globally. Like centralized clustering, they form the hierarchy monotonically, without any provision for documents to move from one cluster to another. Thus they are neither dynamic nor anytime.
Ogston et al. [15] consider a set of agents distributed over a network, initially with random links to one another. Agents form clusters with their closest neighbors, and share their own links with those neighbors to expand the set of agents with whom they can compare themselves. The user specifies the maximum desired cluster size, to keep clusters from growing too large. This system is naturally distributed, anytime, and could reasonably be applied to dynamic situations, but it creates only a partitioning of documents, not a hierarchy.
Synopsis
All previous work on ant clustering other than our own clusters documents spatially on a two-dimensional manifold. In addition, some of these algorithms are multi-stage processes that cannot be applied to a dynamically changing collection of documents, and even those that could be applied to such a collection have not been analyzed in this context. All of the previous ant clustering work produces a flat partition of documents, and thus does not offer the retrieval benefits of a hierarchical clustering.