A content-based image retrieval system may utilize a visual codebook. For example, vectors representing patches or regions of images may be considered “feature points” or words. Such feature points may be extracted from images by a Scale Invariant Feature Transform (SIFT). Once extracted, the feature points may be quantized into visual words. This is a critical step, and may be based on K-means clustering. If the feature points are appropriately assigned to clusters in a manner that fosters convergence to representative centers or code words, the codebook may be constructed. Once constructed, the codebook will enable image retrieval using classical information retrieval techniques. However, the number of images and the number of features points may be huge. Accordingly, scalability problems are present.
Hierarchical K-means (HKM) and Approximate K-means (AKM) may be used to address scalability—particularly to configure clusters of feature points that converge onto an appropriate center. However, both methods have problems. In particular, HKM suffers from performance degradation as clusters are decomposed into smaller-scale clustering. Additionally, AKM requires that selection of an approximate nearest neighbor be of sufficient precision to achieve reasonable performance. Moreover, AKM is not guaranteed to converge, which makes algorithm termination difficult.