It sometimes is the case that one wishes to query a very large database with respect to an input object (e.g., a text document or a photograph), i.e., in order to identify other objects represented within the database that are, e.g., identical or very similar to the input object. Identifying objects that are exactly identical to an input object generally can be performed very quickly using straightforward techniques. Unfortunately, simply identifying data objects that are exactly identical to the query object often will not be very useful, because highly relevant objects, even ones with just a single slight difference from the input object, would not be detected.
As a result, certain techniques have been developed that are capable of identifying both exact and close matches. Examples include the techniques described in commonly assigned U.S. provisional patent application Ser. No. 61/024,630, filed Jan. 30, 2008 (the '630 application), which application is incorporated by reference herein as though set forth herein in full. Unfortunately, while the approaches described in the '630 application work well in many situations, they may not be optimal in all situations.
Other examples include the random hyperplane technique described, e.g., in Charikar, M. S., “Similarity estimation techniques from rounding algorithms,” In Proceedings of the Thirty-Fourth Annual ACM Symposium on theory of Computing (Montreal, Quebec, Canada, May 19-21, 2002), STOC '02. ACM, New York, N.Y., 380-388, and the locality sensitive hashing technique. While these techniques have certain benefits, improvements remain desirable.