A number of strategies have been proposed for identifying and retrieving multimedia data objects stored in a database. At the heart of each of these strategies is a search problem, where a query point is compared to a set of multidimensional (MD) objects in the database. For example, a sample of a song having multiple characteristics (dimensions) may be compared to a number of songs stored in a database to find a song or songs having the same or similar characteristics. As a result of the search, either one or more matches are found, or no match exists in the set of objects in the database. These search problems are usually framed as some form of high dimensional search, where data and query points are mapped into the same high dimensional feature space. For a particular query point, a match is found by finding a data point in the feature space which is close enough to the query point to be considered a match. More specifically, these approximate matching problems are usually framed as epsilon distance queries using some Lp metric, such that the epsilon used is significantly less than the average interpoint distance.
Traditional query processing strategies for solving such problems (e.g. nearest neighbor, epsilon range searching), suffer poor performance due to intrinsic difficulties associated with high dimensionality. These traditional query processing strategies become even more problematic when different matching distances are used for different data points, which turns out to be a very important case for complex high dimensional searches, such as audio fingerprinting and the like. As a result, the most straightforward approach towards solving such problems, linear scan, has typically outperformed more sophisticated approaches. Unfortunately, while simple linear scanning typically achieves better performance with respect to complex high dimensional searches than more complex query processing strategies, linear scanning is a very time intensive process.