With increasing Internet data transfer speeds and the prosperity of WEB 2.0, the amount of image data on the Internet is ever-growing. Image-based websites, such as Flickr, Picasa, You-tube, etc., are growing in popularity, making online content-based image management more important than ever. Since new image data is being uploaded to the Internet all the time, how to efficiently organize, index, and retrieve desired image data is a constant challenge. Categorizing image data can be an enormous endeavor. Efficiently integrating new image data with previously categorized image data without introducing too much complexity and computational cost is important as the collection of image data increases. Avoiding having to re-cluster existing image data can be beneficial in terms of computational time and effort.
Clustering is used to divide a large amount of image data into a number of subsets, e.g., clusters, where image data in a particular cluster is similar in some regard with each other. In general, clustering methods can be categorized into partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods. For example, K-Means is a partitioning method, DBSCAN divides data based on data density, and Hierarchical Agglomerative Clustering (HAC) constructs a hierarchical structure, namely a dendrogram, of the whole dataset. These methods can be run in batch or static mode, which are not appropriate in a dynamic environment where image data is updated at random moments over time.