Signal Comparison and Nearest Neighbor Methods
Comparing signals is one of the most essential and prevalent tasks in signal processing. A large number of applications fundamentally rely on determining the answers to the following two questions: (1) How should the signals be compared? (2) Given a set of signals and a query signal, which signals are the nearest neighbors of the query signal, i.e., which other signals in the database are most similar to the query signal?
Signal comparison is the fundamental building block for the nearest neighbor (NN) search problem, which is defined as follows: Given a set (often referred to as a database) containing signals and a query signal, find the point in the database that is closest to the query signal. The problem can be extended to K-NN, i.e., determining the K nearest neighbors of the query signal. In this context, the signals in question can be images, videos, features extracted from images or videos, or other waveforms. The qualifier closest refers to a distance metric, such as the Euclidean distance or Manhattan distance between pairs of signals. This distance metric captures some notion of similarity between the signals being compared. If two signals are close according to this signal metric, this means the signals are also similar.
Image Retrieval
In a typical image retrieval application, a query image (in the form of data or a signal) is acquired of an unknown object or scene by a client. The query image is compared to images in a database of known objects or scenes stored in a database at a server to determine similar images. As described above, the similarity can be expressed as a distance between features in the unknown and known data. The performance of such applications can be improved significantly by efficient encoding of distances. The search should be quick and computationally efficient, while the transmission should be bandwidth-efficient.
Image descriptors that use a Scale-invariant feature transform (SIFT), speeded up robust feature (SURF), GIST (of an image), and related techniques, enable fast searches using global image characteristics or local image details when bit rate is not an issue. To address communication complexity, several training-based methods are known. However, all those methods require retraining whenever new database entries are added, causing a change in the signal statistics.
In augmented reality (AR) applications, retraining is undesirable. In addition to the complexity of training at the server, retraining repeatedly necessitates updating the client with the retrained parameters. Thus, methods that do not require training are preferable. These include compressed histogram of gradients (CHoG), in which the descriptors are explicitly designed to be compressed using vector quantization and a compact projection, which uses locality sensitive hashing (LSH) on established descriptors.
Rate-Distortion
One aspect of coding theory is concerned deals with optimizing a rate-distortion (R-D) for encoding data, i.e., using the smallest number of bits to encode the data while incurring the least distortion in the data. As used herein data and signals can be used interchangeably.
For example, during image or video encoding, an encoder attempts to reduce the rate for a given visual quality after decoding. Typically, the R-D is determined by an end user of the data, e.g., a viewer.
Randomized Embeddings
An embedding transforms high-dimensional data (or signals) to a lower-dimension such that some aspects of a relative geometry of the data are preserved, e.g., distances in terms of similarity of the data. Because the geometry is preserved, distance computations can be directly performed on the low-dimensional data, often low rate data embeddings, rather than the original high-dimensional data.
FIG. 1 shows example high-dimension L data point u, v, and a distance-preserving embedding function g(d(u, v)) that preserves distance d in the lower dimension log L, where “^” indicates an approximation. As an advantage, the embedding can use a lower transmission rate.
According to the well known Johnson-Lindenstrauss lemma, a small set of high-dimensional data points can be embedded into a low-dimensional Euclidean space such that the distances between the points is approximately preserved, see for example Johnson et al., “Extensions of Lipschitz mappings into a Hilbert space,” Conference in Modern Analysis and Probability, Contemporary Mathematics, American Mathematical Society, pp. 189-206, 1982.
As shown in FIG. 2, for Johnson-Lindenstauss (J-L) embeddings, the function is increasingly linear. For universal quantized, the function is initially approximately linear for relatively small distances, and then quickly flattens for distances greater than a threshold distance D0.
Well known embedding include the J-L embeddings, i.e., ƒ: S→K from a finite set of signals S⊂N to a K-dimensional vector space, such that, given two signals x and y in S, their images satisfy(1−ε)∥x−y∥22≦∥ƒ(x)−ƒ(y)∥22≦(1+ε)∥x−y∥22.
In other words, the embeddings preserve Euclidean distances l2 of point clouds within a small error tolerance ε.
Johnson and Lindenstrauss demonstrated that the distances as described above exists in a space of dimension
      K    =          O      ⁡              (                              1                          ɛ              2                                ⁢          log          ⁢                                          ⁢          L                )              ,where L is the number of signals in S, i.e., its cardinality, and ε the desired tolerance in the embedding. Remarkably, K is independent of the dimensionality N of the set of signals.
It is straightforward to determine such embeddings using a linear mapping. In particular, the function ƒ(x)=Ax, where A is a K×N matrix whose entries are drawn randomly from specific distributions, is a J-L embedding with overwhelming probability. Commonly used distributions include independent and identically distributed (i.i.d.), Gaussian, i.i.d. Rademacher, or uniform i.i.d.
The J-L embedding typically results in a significant dimensionality reduction. However, dimensionality reduction does not immediately produce rate reduction. First, the embeddings must be quantized for transmission and, if the quantization is not well designed, the accuracy of the embedding decreases.
In particular, the quantized embeddings satisfy(1−ε)∥x−y∥−τ≦∥ƒ(x)−ƒ(y)∥≦(1+ε)∥x−y∥+τ, where τ∝2−B is the quantizer step size, decreasing exponentially with the number of bits used per dimension B, while ε is a function of K which is the dimensionality of the projection, which scales approximately as 1/√{square root over (K)}. In the extreme case of 1-bit quantization, the embedding does not preserve the amplitudes of signals and therefore, the l2 distances, although it does preserve their angle, i.e., their correlation coefficient.
When designing a quantized embedding, the total rate is determined by the dimensionality of the projection and the number of bits used per dimension, i.e., R=KB. At a fixed rate R, as the dimensionality K increases, the accuracy of the embedding before quantization, as reflected in ε is increased. To keep the rate fixed, the number of bits per dimension should also decrease, which decreases the accuracy due to quantization, reflected in τ. At a constant rate, a multibit quantizer outperforms the 1-bit quantizers.
Universal Quantization and Embeddings
Universal scalar quantization, modifies scalar quantization and designs the quantizer to have non-contiguous quantization regions. This approach also relies on a Johnson-Lindenstrauss style projection, followed by scaling, dithering and scalar quantizationƒ(x)=Q(Δ−1(Ax+w)).  (1)where A is a random matrix with o, σ2-distributed, i.i.d. elements, Δ−1 is an element-wise (inverse) scaling factor, w is a dither vector with i.i.d. elements, uniformly distributed in [0,Δ], and Q(·) a scalar quantizer operating element-wise on its input. The breakthrough feature in that method is the modified scalar quantizer;
As shown in FIG. 3, the method uses a modified scalar quantizer, which is a 1-bit quantizer designed to have non-contiguous quantization intervals. The quantizer can be thought of as a regular uniform quantizer, determining a multi-bit representation of a signal, and preserving only the least significant bit (LSB) of the representation. Thus, scalar values in [2l,2l+1) quantize to 1, and scalar values in [2l+1,2(l+1)), for any integer l, quantize to 0, Because Q(·) is a 1-bit quantizer, that method uses as many bits as the rows of, i.e., K bits, to encode.
As shown in FIG. 4, the modified quantizer enables efficient universal encoding of signals. Furthermore, that quantization method is also an embedding satisfyingg(∥x−y∥2)−τ≦dH(ƒ(x),ƒ(y))≦g(∥x−y∥2)+τ,  (2)
where dH(·,·) is the Hamming distance of the embedded signals, and g(d) is the map
                                          g            ⁡                          (              d              )                                =                                    1              2                        -                                          ∑                                  i                  =                  0                                                  +                  ∞                                            ⁢                                                ⅇ                                      -                                                                  (                                                                                                            π                              ⁡                                                              (                                                                                                      2                                    ⁢                                    i                                                                    +                                  1                                                                )                                                                                      ⁢                            σ                            ⁢                                                                                                                  ⁢                            d                                                                                                              2                                                        ⁢                            Δ                                                                          )                                            2                                                                                                            (                                          π                      ⁡                                              (                                                  i                          +                                                      1                            /                            2                                                                          )                                                              )                                    2                                                                    ,                            (        3        )            which can be bounded using the bounds
                                          g            ⁡                          (              d              )                                ≥                                    1              2                        -                                          1                2                            ⁢                              ⅇ                                  -                                                            (                                                                        πσ                          ⁢                                                                                                          ⁢                          d                                                                                                      2                                                    ⁢                          Δ                                                                    )                                        2                                                                                      ,                              g            ⁢                          (              d              )                                ≤                                    1              2                        -                                          4                                  π                  2                                            ⁢                              ⅇ                                  -                                                            (                                                                        πσ                          ⁢                                                                                                          ⁢                          d                                                                                                      2                                                    ⁢                          Δ                                                                    )                                        2                                                                                      ,                              g            ⁢                          (              d              )                                ≤                                                    2                π                                      ⁢                                          σ                ⁢                                                                  ⁢                d                            Δ                                      ,                                      (          4          )                .            
The map is approximately linear for small d and, becomes a constant ½ exponentially fast for large d, greater than a distance threshold D0. The slopes of the linear section and the distance threshold D0 are determined by the embedding parameters Δ and A. In other words, the embedding ensures that the Hamming distance of the embedded signals is approximately proportional to the signals' l2 distance, as long as that l2 distance is smaller than D0.
A piecewise linear function with slope
            2      π        ⁢      σ    Δ  until d=D0 and slope equal to zero after that is a very good approximation to (3), in addition to being an upper bound.
The additive ambiguity τ in (2) scales as τ∝1/√{square root over (K)}, similar to the constant ε in the multiplicative (1±ε) factor in J-L embeddings. It should be noted, however, that universal embeddings use 1 bit per projection dimension, for a total rate of R=K. The trade-off between B and K under a constant rate R exhibited by quantized J-L embeddings does not exist under the 1-bit universal embeddings. Still, there is a performance trade-off, controlled by the choice of Δ in (1).
FIGS. 5 and 6 shows experimentally, and provides the intuition on how the embedding behaves for smaller (501) and larger (502) scaling factor Δ and for higher (FIG. 5) and lower (FIG. 6) bitrates. The figures plots the embedding Hamming distance as a function of the signal distance for randomly generated pairs of signals. The thickness of the curve is quantified by τ, whereas the slope of the upward sloping part is quantified by Δ.
In the related to U.S. patent application Ser. No. 12/861,923, “Method for Hierarchical Signal Quantization and Hashing.” we described a method for hierarchically encoding a signal, specifically an image. We formed an inner product of the signal and a hashing vector, and added a dither scalar to the inner product. The result was quantized using a non-monotonic quantization function subject to a sensitivity parameter which changes hierarchically.
In the related U.S. patent application Ser. No. 13/291,384, “Method for Privacy Preserving Hashing of Signals with Binary Embeddings,” we also encoded a signal by dithering and scaling random projections of the signal, and using a non-monotonic scalar quantizer to form a hash. In that Application, privacy of the underlying signal was preserved by keeping the scaling, dithering and projections parameters secret.
In the related U.S. patent application Ser. No. 13/525,222, “Method for Representing Images Using Quantized Embeddings of Scale-Invariant Image Features,” we encoded a signal, specifically images, by extracting scale-invariant features from the image. The features were projected to a lower dimensional random projection matrix by multiplying the features by a matrix, of random entries. The matrix of random projections is quantized to produce a matrix of quantization indices, which form a query vector for searching a database of images to retrieve metadata related to the image.