Human action analysis using computer vision techniques enables applications such as automatic surveillance, behavior analysis, and elderly care. However, the automatic analysis of human motion in videos is currently limited to relatively simple classes of predefined motions, small data sets and simple human actions, such as a single person performing a single primitive action, in a video that is relatively short in length.
In contrast, in a real-life surveillance scenario, video data are often continuously recorded and saved for later analysis. In a typical case, a search for a specific instance of an activity in the video data can result in days of watching video to find images of interest. Performing semantic queries such as “find all instances where a person is walking from left to right”, or “find instances where a person starts walking and then starts running” remains very difficult.
Approximate Nearest Neighbor
Approximate nearest neighbor (ANN) methods, such as variants of locality sensitive hashing (LSH), semantic hashing, and spectral hashing, are computationally efficient for finding objects similar to a query object in large datasets. Those methods have been used to quickly search images in web-scale datasets that can contain millions of images. Unfortunately, the key assumption in those methods is that data points in the dataset are in a Euclidean space and can only be compared using Euclidean distances.
This assumption is not always valid and poses a challenge to several computer vision applications where data commonly are defined on complex non-Euclidean manifolds. In particular, dynamic data, such as human activities, are usually represented as dynamical systems, which lie on non-Euclidean manifolds. Accordingly, the search for the nearest neighbor of the data point has to consider the geometry of the manifold.
Spectral Hashing
As shown in FIG. 2, a spectral hashing (SH) method is an example of hashing methods that map data points lying on Euclidean manifold 210 onto Hamming space 220 such that neighboring data points 225 in Hamming space correspond to neighboring data points 215 on the Euclidean manifold.
Accordingly, for the data points,{xi}i=1NεRd,the goal of the spectral hashing is to find k-bit binary vectors,{yi}i=1Nε{−1,1}k such that similar points in, Rd under the similarity measure,
      W    ij    =      exp    (          -                                    Px            i                    -                                    x              j                        ⁢                          P              2                                                ɛ          2                      )  and map to binary vectors that are close to each other under the Hamming distance weighted by a weighting function W. If the data points Xi are sampled from a probability distribution p(x), then the SH solves the following optimization problem:minimize ∫∥y(x1)−y(x2)∥2W(x1,x2)p(x1)p(x2)dx1dx2 s.t. y(x)ε{−1,1}k ∫y(x)p(x)dx=0∫y(x)y(x)Tp(x)dx=I  (1)
Relaxing the first constraint gives a solution y for the Equation (1) as the first k eigenfunctions of the weighted Laplace-Beltrami operator on the manifold. If the distribution p is multi-dimensional uniform distribution on the Euclidean space Rd and the weighting function W is defined as above, then there is one closed form solution for these eigenfunctions.
If the distribution p is a Gaussian distribution on the Euclidean space Rd, there exists an iterative solution.
The spectral hashing method is summarized into the following steps:
Determining principal components of data using principal component analysis (PCA);
Compute the k smallest single-dimension analytical eigenfunctions of the Laplace-Beltrami operator under the specified weighting function and probability distribution by using a rectangular approximation along every PCA direction; and
Threshold the analytical eigenfunctions computed for each data point at zero, to obtain binary codes.
In theory, any probability distribution on a general manifold and a weighting function can be used to analytically compute the eigenfunctions of the corresponding Laplace-Beltrami operator. However, even for scalar Euclidean data, such computation remains an open and unsolved problem.
In the case of non-Euclidean data that for example represent human activities, such an analysis becomes extremely difficult. The distribution of the data points is usually unknown, and even if a form of the distribution is assumed, a closed-form representation for the distribution on a particular manifold might not exist. Moreover, the weighting function is no longer a simple exponential similarity function as the function is based on geodesic or chord distances on the manifold. Finally, the exact computation of the solution of the minimization problem in Equation (1) for any general weighting function, probability distribution on any arbitrary manifold is extremely difficult.
Kernel Spectral Hashing (KSH) method uses kernel PCA instead of PCA to find the eigenfunctions. The method embeds the data points in a high-dimensional Euclidean space, and finds the value of the eigenfunction at each data point. However, the KSH method computes the kernel of an input data point with all the data points in a training set that used to compute the kernel PCA components. This is as computationally complex as performing exact nearest neighbors by using the kernel as an affinity measure. Even though a well-chosen kernel might give very good results in terms of retrieval accuracy, the KSH method has a computational complexity of O(N), where N is the number of the data points in the training set, which could be in the millions.
Accordingly, it is desired to provide an efficient method for determining the nearest neighbor for the data points lying on a non-Euclidean manifold.