Many methods have been proposed for content-based image searching, particularly where a database of images is large, and where a query image is a distorted version of a requested database image.
Many of the proposed methods of content-based image searching use feature vectors. A feature vector is an array of numbers that represents a portion of an image. When a new feature vector is received, the feature vector is often useful for retrieving similar feature vectors from a database. The similar feature vectors represent images similar to the image associated with the received feature vector.
When a database of images is small and a similarity function is fast to compute, an exhaustive search method can be used. An exhaustive search computes similarity between a query vector associated with a query image and each record in a database. Such an exhaustive search is too slow for many applications, particularly once the size of the database becomes large.
One of the problems with content-based image searching is how to quickly find, in a database, those feature vectors that match a feature vector of a query image.
Hash-based strategies provide image retrieval methods that are closest to being both fast and accurate. Hash-based methods involve computing a hash code for each vector in a database, and using the hash code to associate records within the database with entries in a hash table. At query time, a hash code is computed for a query vector and the hash code is used to quickly find matching records in the hash table. For such a method to be effective, a ‘locality sensitive’ hash function may be used. A locality sensitive hash function returns the same hash code for vectors that are close to each other. A locality sensitive hash function partitions a feature space into regions, where each region is associated with a particular hash code.
One problem that exists with hash-based image retrieval methods is that for any hash function there will always be two vectors that are close but return different hash codes. This problem occurs when the two vectors are located on either side of a partition boundary and leads to the problem of false-negative matches. False-negative matches occur when the image retrieval method fails to find similar vectors because the respective hash codes of the similar vectors are different. Hash perturbation methods overcome such false-negative problems by performing multiple probes per query. The multiple probes are performed by perturbing the hash code of the query point to that of a nearby hash code.
A lattice-based hash generates multiple probes using lattice geometry. In a lattice-based hash, hash codes for registration are created from points in a high dimensional lattice. The query hash codes are determined by finding a Delaunay region containing the query point, and computing a hash code for each lattice point at the vertex of the Delaunay region. The A* lattice is typically used for lattice-based hash methods.
Methods exist for balancing hash codes used for registration by a lattice-based hash. Such methods determine a set of candidate hash codes and select the hash code with the fewest existing registrations. The candidate hash codes are selected from the vertices of the Delaunay region surrounding a feature vector. A lattice point is selected as a candidate for registration, only if the database point is sufficiently far from the plane containing all other lattice points in the Delaunay region.
Calculating the distance of a point to a plane defined by a set of points which lie on the plane is also a well-known problem. Typically, the problem is decomposed into two steps: first, a normal to the plane is calculated, then the dot product of the normal vector and a vector from the plane to the point is calculated. However, while the dot product is easily performed in arbitrary dimensions, the determination of the normal is not. When the plane is in three (3) dimensions, the cross product can be used. However there is no generalisation of the cross product to four (4) or more dimensions.
One common method of calculating a normal in arbitrary dimensions is to use SVD (“Singular Value Decomposition”) to find null space of a matrix formed using points on the plane. First, a matrix is formed in which each row is a point on the plane. Next, the SVD of the matrix is calculated. Finally, the normal is obtained by reading out the last row of the V matrix.
While the SVD method is applicable to any arrangements of points, in any number of dimensions, the SVD method is a costly operation. Efficient implementations have a computational complexity of O(n^3) where n is dimensionality of the feature vectors. Therefore, the distance calculation is slow in high dimensions, and as a result, registrations are slow.
Thus, a need exists to provide an improved method and system for determining the distance to the plane formed by A* points.