Many natural phenomena can be faithfully modeled with multi-linear functions, or closely approximated as such. Multi-linearity means that a high-dimensional matrix of measurements of natural phenomena effects can be factored into low-rank matrices indicating presumed causes or characterizations of the phenomena.
Example applications include the combination of lighting and pose, see Tenenbaum et al., “Separating style and content with bilinear models,” Neural Computation, 12:1247-1283, 2000, shape and motion, see Irani et al., “Factorization with uncertainty, Proc. European Conf. Computer Vision, 2000, image formation, mixing of sources in acoustic recordings, see Casey, “MPEG-7 sound-recognition tools,” IEEE Transactions on Circuits and Systems for Video Technology, 11(6), June 2001 and U.S. Pat. No. 6,321,200 issued to Casey on Nov. 20, 2001, “Method for extracting features from a mixture of signals,” consumer purchasing data, Sarwar et al., “Application of dimensionality reduction in recommender system—a case study,” ACM WebKDD 2000 Web Mining for E-Commerce Workshop. ACM Press, 2000, and word associations in collections of documents, see Berry et al., “Large scale singular value computations,” International Journal of Supercomputer Applications, 6:13-49, 1992, Zha et al., “On updating problems in latent semantic indexing,” SIAM Journal on Scientific Computing, 21(2):782-791), 1999.
As shown in FIG. 1, singular value decomposition (SVD) 100 is the best known and widely used factoring method, see Golub et al., “Matrix Computations,” Johns Hopkins U. Press, 1996. The prior art SVD method 100 provides a bi-linear factoring of a multi-linear data matrix Mpxq 101, into a subspace Upxr 110, diagonal singular values srxl 111, and an encoding VTrxq 112, where p is the number of samples (columns) in the matrix, and q is the number of values (rows) in each sample, thus, p and q are the dimensions of the matrix M 101, r is the rank of the decomposition, and T is the transform. This decomposition is generally expressed as:
                                                        U                              p                ×                r                                      ⁢                          diag              ⁡                              (                                  s                                      r                    ×                    1                                                  )                                      ⁢                          V                              r                ×                q                            T                                ⁢                      ←                          SVD              r                                ⁢                      M                          p              ×              q                                      ,                  r          ≤                      min            ⁡                          (                              p                ,                q                            )                                                          (        1        )            where U and V are orthogonal matrices whose columns and rows, give a linear basis for M's columns and rows, respectively, and s is a diagonal matrix, in short: Udiag(s)VT←M.
FIG. 2 shows the relation of a three dimensional vector 210 to a two-dimensional subspace 200 having axes 201-202. The vector 210 is decomposed into a component 211 within the subspace 200, and remainder component orthogonal (perpendicular) to the subspace 200. The subspace component 211 gives the location of the vector 210 in the subspace. When the modeler has high confidence in the subspace, the subspace component is understood to be the informative part of the signal and the orthogonal component is considered to be noise. Otherwise, the SVD can be updated to incorporate the vector. The subspace component 211 causes the singular vectors to be rotated, while the orthogonal remainder component 212 increases the dimensionality of the subspace (rank of the SVD).
For low-rank phenomena, rtrue<<min(p, q), implying a parsimonious explanation of the measured data. Because rtrue is often unknown, it is common to wastefully determine a large rapprox>>rtrue SVD, and estimate an appropriate smaller value rempirical from the distribution of singular values in s. All but rempirical of the smallest singular values in s are then zeroed to give a “thin” truncated SVD that closely approximates the measured data. This forms the basis of a broad range of data analysis, dimensionality reduction, compression, noise-suppression, and extrapolation methods.
In the prior art, the SVD of is usually determined by a batch method that consumes O(pq2+p2q+q3) amount of time, meaning that all the data must be processed at once, and SVDs of very large sets of data are essentially unfeasible. The well known Lanczos method yield thin SVDs in O(pqr2) time, however, rtrue needs to be known in advance because the Lanczos method is known to be inaccurate for lower ranked singular values.
A more pressing problem is that prior art SVD methods requires complete data, whereas in many real-world applications, some parts of measured data are not available, uncertain, contaminated, otherwise unreliable or non-stationary, generally “incomplete data.” Consequently, a single missing value forces the modeler to discard an entire row or column of the data in the matrix before applying the SVD.
The incomplete data may be imputed from neighboring values, but such imputations typically mislead the SVD away from the most parsimonious, i.e., low-rank, decompositions.
SVD updating is generally based on the Lanczos method, symmetric eigenvalue perturbations, or identities similar to equation 2 below, see Berry et al., Businger, “Updating a singular value decomposition,” BIT, 10:376-385, 1970, Chandrasekaran et al., “An eigenspace update algorithm for image analysis,” Graphical models and image processing: GMIP, 59(5):321-332, 1997, Gu et al., “A stable and fast algorithm for updating the singular value decomposition,” Tech. Report YALEU/DCS/RR-966, Department of Computer Science, Yale University, New Haven, Conn., 1993, and Zha et al. Zha et al. use such an identity. However, their update is approximate and requires a dense SVD. Chandrasekaran et al. begin similarly, but their update is limited to one set of single vectors and is vulnerable to loss of orthogonality.
None of the prior art methods contemplates incomplete data, except insofar as they can be treated as zeros, e.g., see Berry, which is inappropriate for many applications. In batch-SVD contexts, incomplete data are usually handled via subspace imputation. A parameterization of the space of SVDs that allows global marginalization over all missing values is not yet known.
There is a widely accepted expectation-maximization procedure for imputation. The SVD is applied on all complete columns, and regressing on incomplete columns against the SVD to impute incomplete or missing data. Then, the “imputed” complete data are re-factored and re-imputed until a fixed point is reached, see e.g., Troyanskaya et al., “Missing value estimation methods for DNA microarrays,” BioInformatics, 17:1-6, 2001. That procedure is extremely slow operating in quartic time, and only works when a small amount of data are incomplete. The procedure has the further problem that the imputation does not minimize the effective rank.
Other procedures simply fill incomplete data with zeros, or with column means times plus or minus 1, e.g., see Sawar et al.
In the special case where the matrix M is nearly dense, its normalized scatter matrix Σm,n<Mi,mMi,n>i can be fully dense due to fill-in. In that case, Σ's eigenvectors are M's right singular vectors, see Jackson, “A user's guide to principal components,” Wiley, 1991. However, that method does not lead to the left singular vectors, and it often does not work at all because the Σs are frequently incomplete as well, with undefined eigenvectors.
Therefore, it is desired to update an SVD by adding rows or columns of data, which are unavailable, uncertain, contaminated with correlated or colored noise, non-stationary, or otherwise incomplete. In addition, it is desired to handle incomplete data in a manner that minimizes rank. Furthermore, it is desired to perform the SVD in less time with less memory than prior art full-data SVD methods so that SVD updating can be done on-line on streaming data, while still producing informative results leading to a more parsimonious factoring of incomplete data.