1. Field of the Invention
The present invention relates to the problem of finding the nearest neighbor to an object in a set of objects. More specifically, the present invention relates to a method and an apparatus for building and querying a parallel hybrid spill tree to facilitate parallel nearest-neighbor matching operations.
2. Related Art
The proliferation of the Internet and digital photography have made large-scale image collections containing billions of images a reality. However, such large-scale image collections are difficult to organize and navigate. One operation that can simplify the task of organizing images is to identify and remove near-duplicate images in a collection. Near-duplicate images of popular items, such as book covers, CD covers, and movie posters, appear frequently on the web, since such items are often scanned or photographed multiple times with varying resolutions and color balances. Efficient and scalable techniques for locating nearest neighbors in the image feature space and clustering them together are helpful for removing such near-duplicates.
Previous work has shown that a variant of metric trees, known as “hybrid spill trees,” can be used to efficiently locate approximate nearest neighbors in high-dimensional spaces with high accuracy and speed. However, existing techniques that use this data structure to identify near-duplicates, similar objects, and/or nearest neighbors are designed for a single machine, and consequently do not scale to large-scale collections.
Hence, what is needed is a method and an apparatus that facilitates partitioning parallel nearest-neighbor matching operations without the limitations of the above-described techniques.