The amount of information available via computers has dramatically increased with the wide spread proliferation of computer networks, the Internet and digital storage means. With the increased amount of information has come the need to manage, sort-through and selectively access data to facilitate efficient utilization and manipulation of information.
Much of the information generated today can be organized into matrices or data tables. By way of example, online consumer transactions can be organized into a matrix, where rows of the matrix correspond to individual consumers and columns of the matrix correspond to consumers or transactional attributes (e.g., points of purchase, zip codes). Often, such information can be represented as a pointset in Euclidean space, where the dimensionality of the pointset corresponds to a number of coordinates (e.g., attributes) that identifies or locates the points in the space.
Euclidean space is a type of metric space that can have an arbitrary number of dimensions. For example, common everyday space has three dimensions. On the other hand, Euclidean spaces, such as that which may be representative of one or more data processing applications, can have hundreds of thousands of dimensions and many millions of corresponding data points. In such situations, it is often desirable to map the original set of points into a new set of equally many points, residing in a lower dimensional Euclidean space. By mapping the original points to a lower dimensional space, a benefit of data compression is obtained since fewer attributes are required to represent each point. As such, storage requirements and processing capabilities can be significantly reduced. At the same time, though, it is understood that, in general, the new representation cannot perfectly capture all information present in the original, high-dimensional representation.
As an example, one common technique for mapping data to a lower dimensional space is to project the original data on the hyperplane spanned by the eigenvectors corresponding to the k largest singular values of the original data. While such projections have a number of useful properties, they may fail to preserve distances between data points, referred to as a pairwise distance property. That is, pairs of points represented in the lower dimensionality may have distances significantly different from their distances in the original dimensional space. Therefore, algorithms that look to pairwise distances properties as input data can not benefit from this type of mapping as inconsistent results may occur.
As such, it may be desirable to maintain pairwise distance properties so that, for every pair of points, their distance in low dimensional space substantially approximates their distance in high dimensional space. The reason that such a property may be important is that many data processing algorithms are not concerned with other structural properties of the data beyond interpoint distances. As a result, by applying a distance-preserving dimensionality reduction before applying such algorithms a benefit of compression is obtained while the produced results are consistent with the results that the algorithms would give if they were applied to the original high-dimensional data. Besides the compression benefit, by running at a lower dimensional space, many algorithms perform significantly faster than if executed in the original higher dimensional space.
By way of example, such embeddings are useful in solving an ε-approximate nearest neighbor problem, where (after some preprocessing of a pointset P) an answer is given to queries such as, given an arbitrary point x, find a point y ε P, such that for every point
      z    ∈    P    ,                          x        -        z                    ≥                  (                  1          -          ɛ                )            ⁢                                              x            -            y                                    .            Additionally, such embeddings are useful as part of an approximation algorithm for a version of clustering where it is sought to minimize sum of squares of intra cluster distances. Such embeddings can also be useful in data-stream″ computations, where there is limited memory and only a single pass over the data (stream) is allowed.
One approach to performing a transformation that preserves the pairwise distance property is to represent the original data points as an input matrix and to multiply that matrix with a projection matrix R in order to generate a transformed matrix T representative of the transformed or mapped set of data points. The input matrix can be thought of as a set of n points in d dimensional Euclidean space represented as an n×d matrix A where each data point is represented as a row (vector) having d attributes (coordinates). The transformed matrix has the same number of n data points as the input matrix, but has a reduced number of attributes (e.g., k attributes) and thus can be represented as an n×k matrix. Processes and/or algorithms can utilize the transformed matrix instead of the input matrix, thereby increasing computational efficiency.
However, establishing a suitable projection matrix R and multiplying it by the input matrix A can be non-trivial, particularly in many practical computational environments where a very large number of data points and corresponding attributes may exist. For instance, developing the projection matrix R typically includes generating a random number for each entry in the matrix (e.g., Gaussian mean of zero and variance of one), truncating the entries to about five to ten digits, and applying a linear algebraic transformation to the entries to make the columns of the projection matrix orthonormal. This is often an arduous task since the projection matrix can be very large. Then, to perform the matrix multiplication of A by R, substantial computations have to be performed. For example, to transform a million data points in ten thousand dimensional space into a smaller dimensional space (e.g., one thousand dimensional space), a million rows, each having ten thousand columns, have to be multiplied by a matrix having ten thousand rows and one thousand columns.
Although the aforementioned approach preserves a pairwise distance property, such approach has deficiencies (e.g., a sample of the Gaussian distribution is needed for each entry in R; linear algebra techniques are required to obtain the projection matrix R, the resulting projection matrix R is a dense matrix composed of arbitrary floating point numbers, very few of which are 0, making computations numerous and complicated). Accordingly, a more elegant solution to generating a suitable projection matrix in a computationally efficient manner is desired.