A variety of digital information is represented in the form of a matrix (i.e., two-dimensional array) or more generally in the form of a tensor (i.e., multi-dimensional array). Examples of matrix-formed digital information may include: traffic matrix (which describes the volumes of traffic associated with different source-destination pairs in a network at different times), delay matrix (which describes the round-trip delays between nodes in a network), a social proximity matrix (which specifies the social proximity/closeness between users in a social network), a digital image (which specifies the color at different coordinates), a spatio-temporal signal (which specifies the value of the signal at different spatial locations and different times), and the like. One example of tensor-formed digital information is a digital video (which specifies the pixel at different X-Y coordinates at different times). The traffic matrix may also be naturally represented as a three-dimensional array (resulting in a tensor), with the three dimensions being traffic source, traffic destination, and time. Such matrix-formed or tensor-formed digital information may be essential input for a wide range of applications. For example, traffic matrices (TMs), which specify the traffic volumes between origin and destination pairs in a network, are critical inputs to many network engineering tasks, such as traffic engineering (See, B. Fortz and M. Thorup, “Optimizing OSPF/IS-IS weights in a changing world,” IEEE JSAC Special Issue on Advances in Fundamentals of Network Management, Spring 2002 (Fortz et al. 2002), M. Roughan, M. Thorup, and Y. Zhang, “Traffic engineering with estimated traffic matrices,” Proc. of Internet Measurement Conference (IMC), 2003 (Roughan et al. 2003)), capacity planning, and anomaly detection.
In practice, it may be challenging to measure the data of interest directly, completely, accurately, and at all times. A significant challenge faced by many applications that require such matrix-formed or tensor-formed digital information is therefore how to cope with missing values that frequently arise in real-world dataset. Since many applications that require such data are either intolerant or highly sensitive to missing data, it is important to accurately reconstruct the missing values based on partial and/or indirect measurements. Interpolation is the mathematical term for filling in these missing values.
Compressive sensing (also known as compressed sensing) is a generic methodology for dealing with missing values that leverages the presence of certain types of structure and redundancy in data from many real-world systems. Compressive sensing has recently attracted considerable attention in statistics, approximation theory, information theory, and signal processing. Several effective heuristics have been proposed to exploit the sparse or low-rank nature of data (See, E. Candes and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, 9:717-772, 2009 (Candes et al. 2009), E. Candes and T. Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. on Information Theory, 52(12):5406-5425, 2006 (Candes et al. 2006), D. Donoho, “Compressed sensing,” IEEE Trans. on Information Theory, 52(4):1289-1306, 2006 (Donoho 2006), B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed Minimum Rank Solutions to Linear Matrix Equations via Nuclear Norm Minimization,” Preprint, 2007 (Recht et al. 2007), B. Recht, W. Xu, and B. Hassibi, “Necessary and sufficient conditions for success of the nuclear norm heuristic for rank minimization,” Proc. of 47th IEEE Conference on Decision and Control, 2008 (Recht et al. 2008)). Meanwhile, the mathematical theory of compressive sensing has also advanced to the point where the optimality of many of these heuristics has been proven under certain technical conditions on the matrices of interest.
Despite much progress in the area of compressive sensing, the existing compressive sensing algorithms often do not perform well for missing value interpolation on real-world data matrices or tensors, especially under structured, high data loss (e.g., see Section 3 for results in the context of traffic matrices). The main reason is that real-world data often exhibit characteristics that violate the mathematical conditions under which existing compressive sensing algorithms are designed to operate and are provably optimal. Specifically, the optimality results for existing compressive sensing algorithms often assume that (i) the matrix elements are drawn from a Gaussian or Gaussian-like distribution, (ii) the matrix is exactly low-rank, (iii) data loss is independent for different matrix elements, and (iv) the measurement constraints on the matrix satisfy a certain technical conditions, e.g., the restricted isometry property (See Recht et al. 2007). Unfortunately, these conditions may not hold for real data. For example, real traffic matrix elements often exhibit a highly skewed distribution, where the largest and smallest elements often differ in size by several orders of magnitude. Moreover, real traffic matrices are only approximately low-rank, and data loss tends to be highly structured—data may be missing either spatially (entire rows or columns of the matrix may be missing), or temporally (matrix elements over entire segments in time may be missing), or in some combination. Finally, there is no guarantee that the constraints arising from real-world matrix measurements satisfy the required technical condition.
Therefore, there is a need to develop effective techniques to accurately reconstruct missing values in real-world digital information represented in matrix or tensor form.