In many large datasets the relevant information often lies in a low-dimensional subspace of the ambient space, leading to a large interest in representing data with low-rank approximations. A common formulation for this problem is as a regularized loss problem of the form
                                                        min              x                        ⁢                          l              ⁡                              (                                  Y                  ,                  X                                )                                              +                      λ            ⁢                                                  ⁢                          R              ⁡                              (                X                )                                                    ,                            (        1        )            where Yϵt×p is the data matrix, Xϵt×p is the low-rank approximation, (⋅) is a loss function that measure how well X approximates Y, and R(⋅) is a regularization function that promotes various desired properties in X (low-rank, sparsity, group-sparsity, etc.). When l and R are convex functions of X, and the dimensions of Y are not too large, the above problem can be solved efficiently using existing algorithms, which have achieved impressive results. However, when t is the number of frames in a video and p is the number of pixels, for example, optimizing over O(tp) variables can be prohibitive.
To address this, one can exploit the fact that if X is low-rank, then there exist matrices Aϵt×r and Zϵp×r (which will be referred to as the column and row spaces of X, respectively) such that Y≈X=AZT and r«min(t, p). This leads to the following matrix factorization problem, in which an A and Z that minimize are needed
                                                        min                              A                ,                Z                                      ⁢                          l              ⁡                              (                                  Y                  ,                                      AZ                    T                                                  )                                              +                      λ            ⁢                                                  ⁢                                          R                ~                            ⁡                              (                                  A                  ,                  Z                                )                                                    ,                            (        2        )            
where {tilde over (R)}(⋅, ⋅) is now a regularizer on the factors A and Z. By working directly with a factorized formulation such as (2), the size of the optimization problem is reduced from O(tp) to O(r(t+p)). Additionally, in many applications of low-rank modeling the factors obtained from the factorization often contain information relevant to the problem and can be used as features for further analysis, such as in classical PCA. Placing regularization directly on the factors thus allows one to promote additional structure on the factorized matrices A and Z beyond simply being a low-rank approximation, e.g. in sparse dictionary learning the matrix Z should be sparse. However, the price to be paid for these advantages is that the resulting optimization problems are typically not convex due to the product of A and Z, which poses significant challenges.
Despite the growing availability of tools for low-rank recovery and approximation and the utility of deriving features from low-rank representations, many techniques fail to incorporate additional information about the underlying row and columns spaces which are often known a priori. In computer vision, for example, a collection of images of an object taken under different illuminations has not only a low-rank representation, but also significant spatial structure relating to the statistics of the scene, such as sparseness on a particular wavelet basis or low total variation.
Accordingly, there is a need in the art for a low-rank matrix factorization that exploits these additional structures in the data and can be efficiently applied to large datasets, such as images and videos.