The theory of compressive sampling, also known as compressed sensing or CS, is regarded as having been developed by Emmanuel Candès, Justin Romberg, Terence Tao and David Donoho. The following overview of CS is largely drawn from Emmanuel J. Candes and Michael B. Wakin, An Introduction to Compressive Sampling, IEEE Signal Processing Magazine 21 (March 2008).
CS is a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals from far fewer samples or measurements than traditional methods that follow Shannon's theorem: the sampling rate must be at least twice the maximum frequency present in the signal (the so-called Nyquist rate). To make this possible, CS relies on two principles: sparsity, which pertains to the signals of interest, and incoherence, which pertains to the sensing modality.
Sparsity expresses the idea that the “information rate” of a continuous time signal may be much smaller than suggested by its bandwidth, or that a discrete-time signal depends on a number of degrees of freedom, which is comparably much smaller than its (finite) length. More precisely, CS exploits the fact that many natural signals are sparse or compressible in the sense that they have concise representations when expressed in an appropriate basis.
Incoherence extends the duality between time and frequency and expresses the idea that objects have a sparse representation in one domain can be spread out in the domain in which they are acquired, just as a Dirac or spike in the time domain is spread out in the frequency domain. Put differently, incoherence says that unlike the signal of interest, the sampling/sensing waveforms are capable of having an extremely dense representation in an appropriate domain.
According to the theory of CS, one can design efficient sensing or sampling protocols that capture the useful information content embedded in a sparse signal and condense it into a small amount of data. These protocols are nonadaptive and simply involve correlating the signal with a small number of fixed waveforms that are incoherent with the sparsifying basis. What is most remarkable about these sampling protocols is that they allow a sensor to very efficiently capture the information in a sparse signal without trying to comprehend the signal. Further, there is a way to use numerical optimization to reconstruct the full-length signal from the small amount of collected data. In other words, systems based on CS principles can sample—in a signal independent fashion—at a low rate and later use computation power for reconstruction from what appears to be an incomplete set of measurements. Effectively, such systems sense and compress data simultaneously (thus the name compressed sensing).
Sparsity
Systems that perform CS typically are faced with the problem in which information about a signal f(t) is obtained by linear functionals recording the values:yk=f,φk
The objects that the system wishes to acquire can simply be correlated with the waveforms φk (t). This is a standard configuration. If the sensing waveforms are Dirac delta functions (spikes), for example, then y is a vector of sampled values of f in the time or space domain. It the sensing waveforms are sinusoids, then y is a vector of Fourier coefficients; this is the sensing modality used in magnetic resonance imaging MRI.
Systems can apply CS to recover information in undersampled situations. Undersampling refers to a circumstance in which the number M of available measurements is much smaller than the dimension N of the signal f. Such problems are extremely common for a variety of reasons. For instance, the number of sensors may be limited, the measurements can be extremely expensive (e.g. certain imaging processes via neutron scattering), or the sensing process may be slow so that one can only measure the object a few times (e.g. in an MRI).
In undersampled situations, a CS system is faced with the task of solving an underdetermined linear system of equations. Letting A denote the M×N sensing or measurement matrix with the vectors φ1*, . . . , φM* as rows (a* is the complex transpose of a), the process of recovering f εN from y=AfεM is ill-posed in general when M<N: there are infinitely many candidate signals for f. Shannon's theory indicates that, if f(t) has low bandwidth, then a small number of (uniform) samples will suffice for recovery. Using CS, signal recovery can actually be made possible using a broader class of signals.
Many natural signals have concise representations when expressed in a convenient basis. Mathematically speaking, a vector fεN can be expanded in an orthonormal basis Ψ=[ψ1ψ2 . . . ψN] as follows:
      f    ⁡          (      t      )        =            ∑              i        =        0            N        ⁢                  ⁢                  x        i            ⁢                        ψ          i                ⁡                  (          t          )                                    where x is the coefficient sequence of f, xi=f,ψk.        
It can be convenient to express f as Ψx (where Ψ is the N×N matrix with ψ1, . . . , ψn as columns). The implication of sparsity is now clear: when a signal is a sparse expansion, the small coefficients can be discarded without much perceptual loss. Formally, consider fs (t) obtained by keeping only the terms corresponding to the S largest values of (xi). By definition fs:=Ψxs, where xs is the vector of coefficients (xi) with all but the largest S set to zero. This vector is sparse in a strict sense since all but a few of its entries are zero. Since Ψ is an orthonormal basis, ∥f−fS∥t2=∥x−xS∥t2, and if x is sparse or compressible in the sense that the sorted magnitudes of the (xi) decay quickly, then x is well approximated by xs and, therefore, the error ∥f−fs∥t2 is small. In plain terms, one can “throw away” a large fraction of the coefficients without much loss. As can be appreciated, sparsity is a fundamental modeling tool which permits efficient fundamental signal processing; e.g., accurate statistical estimation and classification, efficient data compression, etc. Sparsity has more surprising and far-reaching implications, however, which is that sparsity has significant bearing on the acquisition process itself. Sparsity determines how efficiently one can acquire signals nonadaptively.
Incoherent Sampling
Consider a pair (Φ, Ψ) of orthonormal bases or orthobases of N. The first basis D is used for sensing the object f and the second Ψ is used to represent f. The coherence between the sensing basis Φ and the representation basis Ψ is
      μ    ⁡          (              Φ        ,        Ψ            )        =            N        ·                  max                              1            ≤            k                    ,                      j            ≤            N                              ⁢                                〈                                    φ              k                        ,                          ψ              j                                〉                            
In plain English, coherence measures the largest correlation between any two elements of Φ and Ψ. If Φ and Ψ contain correlated elements, the coherence is large. Otherwise, it is small. As for how large and how small, it follows from linear algebra that μ(Φ, Ψ)ε[1, √{square root over (N)}].
Compressive sampling is mainly concerned with low coherence pairs of bases. Such bases include the time frequency pair where Φ is the canonical or spike basis and Ψ is the Fourier basis, and wavelet bases for Ψ and noiselet basis for Φ. Random matrices are largely incoherent with any fixed basis Ψ. Select an orthobasis Φ uniformly at random, then with high probability, the coherence between Φ and Ψ is about √{square root over (2 log N)}. In terms of hardware cost and complexity, it is desirable if the signal basis, Ψ, does not need to be known a priori in order to determine a viable sensing matrix Φ. Fortunately, random sensing matrices with sufficient sample size exhibit low coherence with any fixed basis. This means that a random sensing matrix can acquire sufficient measurements to enable signal reconstruction of a sparse signal without knowing a priori the proper basis Ψ for the signal.
Undersampling and Sparse Signal Recovery
Ideally, the N coefficients of f are observed, but in reality a CS system can only observe a subset of these and collect the datayk=f,Φk,kεM 
where M⊂[1, . . . , n] is a subset of cardinality M<N.
With this information, a conventional approach is to recover the signal by l1-norm minimization. Essentially, for all objects consistent with the data, find the object with the coefficient sequence that minimizes the l1-norm. The use of the l1-norm as a sparsity-promoting function traces back several decades. A leading early application was reflection seismology, in which a sparse reflection function (indicating meaningful changes between subsurface layers) was sought from bandlimited data. However l1-norm minimization is not the only way to recover sparse solutions; other methods, such as greedy algorithms, or Orthogonal Matching Pursuit can also be utilized.
In view of the above, CS suggests a very concrete acquisition protocol: sample nonadaptively in an incoherent domain and invoke linear programming after the acquisition step. Following this protocol enables the acquisition of a signal in a compressed form. A decoder can then “decompress” this data.