Many aspects of machine learning and data mining are affected by what has become known as “the curse of dimensionality”. In order to find more sophisticated trends in data, potential correlations between larger and larger groups of variables must be considered. Unfortunately, the number of potential correlations generally increases exponentially with the number of input variables and, as a result, brute force approaches become infeasible.
A natural goal for machine learning is to attempt to identify and isolate these characteristic dimensions. We would like to simplify the data sufficiently so that we can apply traditional machine learning techniques, yet we do not wish to oversimplify, leaving out information crucial to understanding. A method widely used in this regard is to cast the data as a matrix A (indexed by <instance, attribute>) and compute a low rank approximation, D, of A. The idea is that the rank of a matrix corresponds roughly to the degrees of freedom of its entries. By constraining the rank of D we aim to capture the most pertinent characteristics of the data in A, leaving behind dimensions in which the data appears “random”.
Such low rank approximations are most often derived by computing the Singular Value Decomposition of A and taking the rank k matrix, Ak, that corresponds to the k largest singular values.
Recall that for an arbitrary matrix A its Frobenius norm, |A|F, is given by
                  A              F    =                              ∑                      i            ,            j                          ⁢                              A            ⁡                          (                              i                ,                j                            )                                2                      .  
Perhaps the best-known property of Ak is that for any rank k matrix D,|A−D|F≧|A−AK|F.  (1)
that is, Ak is the optimal rank k approximation of matrix A, since every other rank k matrix D is “further” from A as measured by the Frobenius norm.
This method has met with significant empirical success in a number of different areas, including Latent Semantic Analysis (LSA) in Information Retrieval as described in Berry et al., Matrices, Vector Spaces, and Information Retrieval, SIAM Rev. 41 (1999) no. 2, 335-362 and Berry et al., Using Linear Algebra for Intelligent Information Retrieval, SIAM Rev. 37 (1995), no. 4, 573-595. This method has also met with significant empirical success in Face Recognition, as described in Turk et al., Eigenfaces for Recognition, Journal of Cognitive Neuroscience 3 (1991), no. 1, 71-86.
Accordingly, this invention arose out of concerns associated with providing improved methods and systems for processing data in high dimensional space and, in particular, for computing low rank approximations to matrices using the Singular Value Decomposition.