Various embodiments of this disclosure relate to CUR decompositions and, more particularly, to improved techniques for finding a CUR decomposition.
CUR decompositions are often applied to the field of data compression. If data is in the form of an m×n matrix A, then storage space of O(mn) is required to store this data. CUR decompositions compress the data by removing some redundancy and reducing the rank of A, hence requiring reduced storage.
Given as inputs a matrix Aεm×n and integers c<n and r<m, the CUR decomposition, or factorization, of A finds Cεm×c with c columns of A, Rεr×n with r rows of A, and Uεc×r, such that A=CUR+E. The value of E=A−CUR represents the residual error matrix.
Generally, where matrix A represents some data set, the CUR decomposition of A (i.e., C*U*R) approximates A. For example, a CUR decomposition can be used to compress the data in A by removing some of the redundancy in A. From an algorithmic perspective, it is a challenge to construct C, U, and R quickly and in such a way as to minimize the approximation error ∥A−CUR∥F2.
More precisely, the “CUR Problem,” as is will be referred to herein, is as follows: Given Aεm×n of rank ρ=rank(A), rank parameter k<ρ, and accuracy parameter 0<ε<1, construct Cεm×c with c columns from A, Rεr×n with r rows from A, and Uεc×r, with c, r, and rank (U) being as small as possible, in order to reconstruct A within relative-error:∥A−CUR∥F2≦(1+ε)∥A−Ak∥F2.
In contrast to the above is the singular value decomposition (SVD) factorization, where k<rank (A), and A=UkΣkVkT+Aρ−k. The SVD residual error Aρ−k is the best possible, under certain rank constraints. The matrices Ukεm×k and Vkεn×k contain the top k left and right singular vectors of A, while Σkεk×k contains the top k largest singular values of A. In CUR, C and R contain actual columns and rows of A, a property desirable for feature selection and data interpretation when using CUR as an approximation of A. Because C and R are actual columns and rows of A, sparsity in the input matrix A is also preserved in CUR. Thus, CUR is attractive in a wide range of applications.