1. Field of the Invention
Many applications in communications and signal processing require that the parameters of a signal be estimated and tracked over time.
One particularly important example in radio communications is frequency and phase tracking of a complex-valued sinusoidal signals(t)=Aej(2πft+θ),  (1)or a real-valued sinusoidal signals(t)=A cos(2πft+θ),  (2)where A is the amplitude, j=√{square root over (−1)}, f is the frequency of the sinusoid in Hz, t represents time in seconds, and θ is the phase.
It is often useful to estimate the frequency f and phase θ of a received version x(t) of such as sinusoid that has been corrupted by some linear or nonlinear transformation, interference, and/or noise. This estimation plays an important role in many radio receivers, for example, to estimate the carrier frequency of a radio transmission for subsequent demodulation of some message signal (see John G. Proakis and Masoud Salehi. “Communication Systems Engineering, Second Edition,” Prentice Hall, New Jersey, 2002). Additional applications of such parameter estimation and tracking include but are not limited to: ultrasound imaging (see C. Bartoletti, M. Caciotta, and F. Leccese (Italy), “New PLL System to High Accuracy Ultrasound Measurement,” in Circuits, Signals, and Systems (449)-2004), satellite communication (see Seki Kazuhiko and Kato Shuzo, “A Self Frequency Preset PLL Synthesizer (Special Issue on Satellite Communications Networking and Applications),” in IEICE transactions on communications, Vol. E76-B, No. 5(19930525) pp. 473-479), and radar (see Jian Zhang, Minghui Yang, Shuqi Zheng, and Xiaowei Sun, “A low-cost Doppler radar system with 24 GHz PLL for remote detection of heart beat,” in Microwave and Optical Technology Letters, Volume 50 Issue 12, Pages 3139-3142 Published Online: 24 Sep. 2008).
Moreover, it is often useful track changes in the frequency and phase as they become functions of time, that is, f(t) and θ(t). Without loss of generality, one typically considers tracking only θ(t), since one can incorporate changes in f(t) into θ(t). That is, changing the frequency over time by Δf changes the phase by Δft. This combined estimation and tracking plays an important role in many applications, such as the demodulation of frequency modulated (FM) communication signals.
Estimation and tracking of analog signals can be performed either in the analog or digital domains (by first sampling via an analog-to-digital converter).
2. Brief Description of the Related Art
A standard and widely used method and apparatus for tracking the frequency and phase of sinusoidal and other signals is the phase locked loop (PLL) (see S. Goldman, Phase-Locked Loop Engineering Handbook for Integrated Circuits. Artech House, 2007). The PLL is an analog or digital (we will use the term discrete-time interchangeably) device that uses a feedback loop to continuously update an estimate of the frequency or phase (or both quantities) of the signal. One representative embodiment of a discrete-time PLL for real-valued sinusoidal signals like in equation (2) is shown in FIG. 1.
Complex-valued PLLs operate similarly. For example to track a complex signal, instead of using an oscillator that generates a sine or cosine wave (sin(2πft+θ(t)) for example), the oscillator can attempt to match the phase with an exponential signal, generating e−jθ(t). Sithamparanathan Kandeepan and Sam Reisenfeld, “Phase Detector Models and Their Performances for IF/Baseband Frequency Recovery for Complex Envelope Based DSP Implemented PLL,” IEEE ICCS, 2002 offers a solid explanation of operating in the complex domain, and the tradeoffs for several different models of phase detection.
Consider an oscillatory digital (discrete-time) signal x[n] whose phase one wishes to track. The PLL tracks the signal phase by generating an estimated signal u[n] (using an estimated frequency and phase) and then comparing the input and estimated signals. If the phase of the estimated signal u[n] lags behind the input signal x[n], then the PLL feedback loop advances the phase of u[n], and vice versa. The goal of the feedback loop in this case is for the estimated signal to be orthogonal to the input. For example, if the input is sin(α), we want our oscillator to be at cos(β), where β≈α, due to the trigonometric relation,
                                          sin            ⁡                          (              α              )                                ⁢          cos          ⁢                                          ⁢                      (            β            )                          =                                            1                              2                ⁢                                                                                        ⁢                          sin              ⁡                              (                                  α                  +                  β                                )                                              +                                    1              /              2                        ⁢                                                  ⁢                                          sin                ⁡                                  (                                      α                    -                    β                                    )                                            .                                                          (        3        )            If we multiply a sinusoidal input sin(2πf0t+θ1(t)) by cos(2πf1t+θ2(t)), then we obtain a high frequency component
      1    2    ⁢      sin    ⁡          (                        2          ⁢                      π            ⁡                          (                                                f                  0                                +                                  f                  1                                            )                                ⁢          t                +                  (                                                    θ                1                            ⁡                              (                t                )                                      +                                          θ                2                            ⁡                              (                t                )                                              )                    )      and low frequency component
      1    2    ⁢            sin      ⁡              (                              2            ⁢                          π              ⁡                              (                                                      f                    0                                    -                                      f                    1                                                  )                                      ⁢            t                    +                      (                                                            θ                  1                                ⁡                                  (                  t                  )                                            -                                                θ                  2                                ⁡                                  (                  t                  )                                                      )                          )              .  The high frequency component is removed by a loop filter, and if the estimate was approximately equal to the input, then the argument of sin(2π(f0−f1)t+(θ1(t)−θ2(t))), thus returning the phase difference by the approximation θ≈ sin(θ). This phase difference then provides the necessary information to update the oscillator, either in continuous or discrete-time.
The classical PLL of FIG. 1 has three fundamental components in its feedback loop (see Floyd M. Gardner “Phaselock Techniques, Third Edition,” Wiley, New Jersey, 2005):                Phase-Locked Oscillator: The phase locked oscillator 110 uses the current phase estimate θ(n) to generate an estimated signal to be compared to the input.        Phase Detector: The phase detector 120 estimates the phase difference between the input signal and the estimated output from the oscillator and feeds the output to the loop filter. This component is often implemented as a combination of a multiplier and a low-pass filter to approximate an inner product between the two signals.        Loop Filter: The loop filter 130 smoothes the phase estimate of the phase detector. It provides robustness to noise but also limits the ability of the PLL to track rapid changes in phase.In the phase detector, the phase difference between x[n] and u[n] is estimated by multiplying the two signals and then low-pass-filtering the product        
                                                        θ              ^                        ⁡                          [              n              ]                                =                                    ∑              k                        ⁢                                          x                ⁡                                  [                  k                  ]                                            ⁢                              u                ⁡                                  [                  k                  ]                                            ⁢                              h                ⁡                                  [                                      n                    -                    k                                    ]                                                                    ,                            (        4        )            where h[n−k] is the impulse response of the low-pass filter. The output {circumflex over (θ)}[n] can be interpreted as the correlation or the inner product of x[n] with u[n] using h[−n] as the inner product kernel (see T. Jebara, R. Kondor, and A. Howard, “Probability Product Kernels,” in Journal of Machine Learning Research, Vol 5, pg 819-844, 2004). This interpretation is the key insight that enables the generalization of the PLL in our invention.
The loop filter 130 operates on {circumflex over (θ)}[n]. Among one of many possibilities for this is an exponentially decaying moving average filter. Its output is the phase estimate θ[n] that controls the oscillator. In the case of a simple first-order moving average filter, we haveθ[n]=θ[n−1]+μ{circumflex over (θ)}[n],  (5)where μ is the decay rate of the exponential moving average. This filtering is usually performed to smooth the phase estimate θ[n] and stabilize the PLL in the presence of noise.
Note that, although the moving average is often described in the literature as a separate component of the PLL, its action can be incorporated in the impulse response of the low-pass filter h[n]. That is, defining the impulse response of the loop filter in equation (5) as g[n], we can compute that the estimated phase θ [n] corresponds to the inner product of x[n] with u[n] using h[−n]*g[−n] as the inner product kernel, where * denotes convolution (p[n]*z[n]=Σkp[k]z[n−k]).
The oscillator then uses the phase estimate θ[n] to produce the oscillating estimated signal x[n]=cos(2πf0n+θ[n]). Here f0 is the frequency of the oscillator, which is provided to PLL and should hopefully be near the frequencies of the signal being tracked. The ability to lock-on to an oscillation frequency in a large frequency band is one performance characteristics considered when designing/analyzing a PLL. If the PLL tracks both frequency and phase properly, then f0 just needs to be a rough estimate of fc 
Several variations of this basic design exist; for example, components such as a multiplier or a divider can be inserted between the loop filter and the oscillator in order to allow tracking of a frequency that is at a multiple of the oscillator frequency (see D. Banerjee, PLL Performance, Simulation, and Design, 4th ed. Dog Ear Publishing, LLC, 2006).
A PLL can be implemented in analog hardware, software, or a mixture of both. For the mixed analog/digital PLL, the input is sampled from analog to digital, and provided to a discrete-time processor, Adjustments to the design can be made to handle delays in the loop if necessary. There are also some all-digital PLLs that rely on counting clock periods and thus perform a similar tracking function but not with a broad range of input signals.
For mixed analog/digital (analog input, digital processing) and all-digital (digital input, digital processing) PLLs, the delay of the loop should be considered when designing loop components such as filters and integrators, but the fundamental ideas of phase detection, filtering, and signal generation are the same. Each component has a digital counterpart; signal generation can become sin(2πf/fsn+θ[n]), where fs is the sampling frequency, filters can be designed digitally, and phase detection can be replaced with an ideal multiplier.
There has been some previous work on implementations of random sampling and simple non-uniform sampling schemes in phase-locked loops. Theoretical results of random sampling on loops such as a digital lock-in amplifier were addressed in M. O. Sonnaillon, R. Urteaga, F. J. Bonetto, M. Ordonez, “Implementation of a high-frequency digital lock-in amplifier,” in Electrical and Computer Engineering, 2005, Canadian Conference on, 1229-1232 May 2005, M. O. Sonnaillon, R. Urteaga, F. J. Bonetto, and M. Ordonez, “Random sampling in high-frequency digital Lock-In amplifiers,” in X1 Reunion de Trabajo en Procsamiento de la Informacion y Control, 752, September 2005, and M. O. Sonnaillon, R. Urteaga, and F. J. Bonetto, “High-Frequency Digital Lock-In Amplifier Using Random Sampling”, in Instrumentation and Measurement, IEEE Transactions on, vol 57, num 3, 616-621, March 2008. M. O. Sonnaillon and F. J. Bonetto, “FPGA Implementation of a Phase Locked Loop Based on Random Sampling,” in Programmable Logic, 2007. SPL '07. 2007 3rd Southern Conference on, 1-6, February 2007 follows up with a simple FPGA implementation of a phase-locked loop for quantized additive random sampling (ARS).
The basic effects of non-uniform sampling in PLLs have been analyzed, often in the context of considering sampling jitter. PLL's have been shown to work under these circumstances. For example, A. Weinberg and B. Liu, “Discrete Time Analyses of Nonuniform Sampling First- and Second-Order Digital Phase Lock Loops”, in Communications, IEEE Transactions on, vol 22, num 2, pps 123-137, February 1974 addresses the issue.
Some digital PLLs attempt to sample the signal at or near zero-crossings (see Saleh R Al-Araji, Zahir M. Hussain, and Mahmoud A. Al-Qutayri, “Digital Phase Lock Loops: Architectures and Applications,”Springer, The Netherlands, 2006). Additionally the digital tan-lock loop uses the sampling from the output of the loop to adjust the new sampling time.
Compressive sensing and other demodulators differ from these schemes by taking random linear combinations of the signal inputs. The fundamental difference in underlying principle is that the measurements preserve weighted inner products, and thus do not destroy signal phase information.
Compressive Sensing and the Restricted Isometry Property
In the standard CS framework, we acquire a signal xεN via the linear measurementsy=Φx,  (6)where Φ is an M×N matrix representing the sampling system and yεM is the vector of measurements acquired. For simplicity, we deal with real-valued rather than quantized measurements y. Classical sampling theory dictates that to ensure that there is no loss of information the number of samples M should be at least the signal dimension N. The CS theory, on the other hand, permits the acquisition of significantly fewer samples than N, as long as the signal x is sparse or compressible in some basis (see E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489-509, 2006; E. J. Candès, “Compressive sampling,” in Proc. International Congress of Mathematicians, Madrid, Spain, 2006, vol. 3, pp. 1433-1452; D. L. Donoho, “Compressed sensing,” IEEE Trans. Info. Theory, vol. 52, no. 4, pp. 1289-1306, September 2006; and E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, pp. 4203-4215, December 2005).
To understand precisely how many measurements are required to enable the recovery of a signal x, we must first examine the properties of Φ that guarantee satisfactory performance of the sensing system. Candès and Tao (see E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, pp. 4203-4215, December 2005) introduced the restricted isometry property (RIP) of a matrix Φ and established its important role in CS. First we define to be the set of all K-sparse signals in N, i.e.,ΣK={xεN:∥x∥0≦K}, where ∥·∥0 denotes the l0 quasi-norm, which simply counts the number of non-zero entries of a vector. We say that a matrix Φ satisfies the RIP of order K if there exists a constant δε(0, 1), such that(1−δ)∥x∥22≦∥Φx∥22≦(1+δ)∥x∥22  (7)holds for all xεΣK. In other words, Φ acts as an approximate isometry on the set of vectors that are K-sparse.
It is clear that if we wish to be able to recover all K-sparse signals x from the measurements y, then a necessary condition on Φ is that Φx1≠Φx2 for any pair x1, x2εΣK with x1≠x2. Equivalently, we require ∥Φ(x1−x2)∥22>0, which is guaranteed if Φ satisfies the RIP of order 2K with constant δ<1. Furthermore, the RIP also ensures that a variety of practical algorithms can successfully recover any compressible signal from noisy measurements. The following theorem, a slight modification of Theorem 1.2 from E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” in Compte Rendus de l'Academie des Sciences, Paris, Series I, 2008, vol. 346, pp. 589-592, makes this precise by bounding the recovery error of x with respect to the sampling noise and with respect to the l1-distance from x to its best K-term approximation denoted xK:
      x    K    =      arg    ⁢                  ⁢                  min                              x            ′                    ∈                      Σ            K                              ⁢                                                              x              -                              x                ′                                                          1                .            Theorem 1. Suppose that Φ satisfies the RIP of order 2K with isometry constant satisfying δ<√{square root over (2)}−1. Given measurements of the form y=Φx+e, where ∥e∥2<ε, the solution to
                                          x            ^                    =                                    arg              ⁢                                                          ⁢                                                min                                                            x                      ′                                        ∈                                          ℝ                      N                                                                      ⁢                                                                                                                          x                        ′                                                                                    1                                    ⁢                                                                          ⁢                  subject                  ⁢                                                                          ⁢                  to                  ⁢                                                                          ⁢                                                                                                                                    Φ                          ⁢                                                                                                          ⁢                                                      x                            ′                                                                          -                        y                                                                                    2                                                                        ≤            ε                          ⁢                                  ⁢        obeys        ⁢                                  ⁢                                                                                                                  x                    ^                                    -                  x                                                            2                        ≤                                                            C                  0                                ⁢                ε                            +                                                C                  1                                ⁢                                                                                                                          x                        -                                                  x                          K                                                                                                            1                                                        K                                                                                ,                                          ⁢          where                ⁢                                  ⁢                                            C              0                        =                          4              ⁢                                                          ⁢                                                                    1                    +                    δ                                                                    1                  -                                                            (                                              1                        +                                                  2                                                                    )                                        ⁢                    δ                                                                                ,                                          ⁢                                    C              1                        =                          2              ⁢                                                          ⁢                                                                    1                    -                                                                  (                                                  1                          -                                                      2                                                                          )                                            ⁢                      δ                                                                            1                    -                                                                  (                                                  1                          +                                                      2                                                                          )                                            ⁢                      δ                                                                      .                                                                        (        8        )            
Note that in practice we may wish to acquire signals that are sparse or compressible with respect to a certain sparsity basis Ψ, i.e., x=Ψα where Ψ is represented as an N×N matrix and αεΣK. In this case we would require instead that ΦΨ satisfy the RIP, and the performance guarantee would be on ∥{circumflex over (α)}−α∥2.
Before we discuss how one can actually obtain a matrix Φ that satisfies the RIP, we observe that we can restate the RIP in a slightly more general form. If δε(0, 1) and D,F⊂N, we will say that a mapping Φ is a δ-stable embedding of (D,F) if(1−δ)∥d−f∥22≦∥Φd−Φf∥22≦(1+δ)∥d−f∥22  (9)for all dεD and fεF. A mapping satisfying this property is also commonly called bi-Lipschitz. Observe that for a matrix Φ, satisfying the RIP of order 2K is equivalent to being a δ-stable embedding of (ΣK,ΣK) or of (Σ2K, {0}). (In general, if Φ is a δ-stable embedding of (D, F), this is equivalent to it being a δ-stable embedding of ({tilde over (D)}, {0}) where {tilde over (D)}={d−f:dεD, fεF}. This formulation can sometimes be more convenient.) Furthermore, if the matrix ΦΨ satisfies the RIP of order 2K then Φ is a δ-stable embedding of (Ψ(ΣK), Ψ(ΣK)) or ((Ψ(Σ2K), {0}), where Ψ(ΣK)={x=Ψα:αεΣk}.
A key consequence of the RIP is that inner products are approximately preserved between any two sparse or compressible signals:<Φx1,Φx2>≈<x1,x2>.  (10)This property ensures that the signal geometry between the signals of interest is preserved by the sampling process, i.e., (10) is an equality up to a small difference, ±η, that can be controlled by the number of measurements.
Random Matrix Constructions
We now turn to the more general question of how to construct linear mappings Φ that satisfy (9) for particular sets D and F. While it is possible to obtain deterministic constructions of such operators, in order to obtain Φ with the fewest number of rows possible, we often turn to random matrix constructions. One of several possibilities, we will construct our random matrices as follows: given M and N, generate random M×N matrices Φ by choosing the entries φij as independent and identically distributed (i.i.d.) random variables. For our embodiment, we will impose two conditions on the random distribution. First, we use a distribution that will yield a matrix that is norm-preserving, which will require that
                              𝔼          ⁡                      (                          ϕ              ij              2                        )                          =                              1            M                    .                                    (        11        )            Second, we utilize a distribution that is a sub-Gaussian distribution, meaning that there exists a constant C>0 such that(eΦijt)≦eC2t2/2  (12)for all tε This says that the moment-generating function of our distribution is dominated by that of a Gaussian distribution, which is also equivalent to requiring that tails of our distribution decay at least as fast as the tails of a Gaussian distribution. Examples of sub-Gaussian distributions include the Gaussian distribution, the Rademacher distribution, and the uniform distribution. In general, any distribution with bounded support is sub-Gaussian (see V. V. Buldygin and Yu. V. Kozachenko, Metric Characterization of Random Variables and Random Processes, American Mathematical Society, Providence, R.I., 2000).
The key property of sub-Gaussian random variables that will be of use in our discussion is that for any xεN, the random variable ∥Φx∥22 is strongly concentrated about its expected value; that is, there exists a constant c>0 that depends only on the constant C in (12) such thatPr(|∥Φx∥22−∥x∥22|≧δ∥x∥22)≦2e−cMδ2,  (13)where the probability is taken over all M×N matrices Φ (see D. Achlioptas, “Database-friendly random projections,” in Proc. Symp. Principles of Database Systems, 2001).
Stable Embeddings
We now provide a number of results that we will use extensively in the sequel to ensure the stability of our compressive detection, classification, estimation, and filtering algorithms.
We will start by considering the simple case where we want to have a δ-stable embedding of (D, F) where D={di}i=1|D| and F={fj}j=1|F| are finite sets of points in N. In the case where D=F, this is essentially the Johnson-Lindenstrauss (JL) lemma (see W. B Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” in Proc. Conf. in Modern Analysis and Probability, 1984, pp. 189-206; S. Dasgupta and A. Gupta, “An elementary proof of the Johnson-Lindenstrauss lemma,” Tech. Rep. TR-99-006, Berkeley, Calif., 1999; and D. Achlioptas, “Database-friendly random projections,” in Proc. Symp. Principles of Database Systems, 2001). |D| denotes the number of points in D.
Lemma 1. Let D and F be sets of points in N. Fix δ, βε(0, 1). Let Φ be an M×N random matrix with i.i.d. entries chosen from a distribution that satisfies (13). If
                    M        ≥                                            ln              ⁡                              (                                                                          D                                                        ⁢                                                          F                                                                      )                                      +                          ln              ⁡                              (                                  2                  /                  β                                )                                                          c            ⁢                                                  ⁢                          δ              2                                                          (        14        )            then with probability exceeding 1−β, Φ is a δ-stable embedding of (D, F).Proof. To prove the result we apply (13) to the |D∥F| vectors corresponding to all possible di−fj. By applying the union bound, we obtain that the probability of (9) not holding, β, is bounded byβ≦2|D∥F|e−−cMδ2.By solving for M we obtain the desired result.
We now consider the case where D=(ΨJ) and F={0}, where ΨJ is an orthonormal basis for a K-dimensional subspace of N, and (·) denotes the range, or column span, of an operator. Thus, we wish to obtain a Φ that preserves the norm of any vector xε(ΨJ). At first glance, this might seem very different than the setting for Lemma 14, since the former involves an uncountable point set, and the latter deals only with embedding a finite number of points. However, we will see that the dimension K bounds the complexity of this space and thus it can be characterized in terms of a finite number of points. The following lemma follows from Lemma 5.1 in R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253-263, December 2008.
Lemma 2. Suppose that ΨJ is an orthonormal basis for a K-dimensional subspace of N. Fix δ,βε(0, 1). Let Φ be an M×N random matrix with i.i.d. entries chosen from a distribution that satisfies (13). If
                    M        ≥                  2          ⁢                                          ⁢                                                    K                ⁢                                                                  ⁢                ln                ⁢                                  (                                      42                    /                    δ                                    )                                            +                              ln                ⁡                                  (                                      2                    /                    β                                    )                                                                    c              ⁢                                                          ⁢                              δ                2                                                                        (        15        )            then with probability exceeding 1−β, Φ is a δ-stable embedding of ((ΨJ, {0}).
We now observe that we can extend this result beyond a single K-dimensional subspace to all possible subspaces of K-sparse signals, i.e., ΣK (see R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253-263, December 2008).
Lemma 3. Let Ψ be an orthonormal basis and fix δ,βε(0, 1). Let Φ be an M×N random matrix with i.i.d. entries chosen from a distribution that satisfies (13). If
                    M        ≥                  2          ⁢                                          ⁢                                                    K                ⁢                                                                  ⁢                                  ln                  ⁡                                      (                                          42                      ⁢                      e                      ⁢                                                                                          ⁢                                              N                        /                        δ                                            ⁢                                                                                          ⁢                      K                                        )                                                              +                              ln                ⁡                                  (                                      2                    /                    β                                    )                                                                    c              ⁢                                                          ⁢                              δ                2                                                                        (        16        )            with e denoting the base of the natural logarithm, then with probability exceeding 1−β, Φ is a δ-stable embedding of (Ψ(ΣK), {0}).
A similar technique has recently been used to demonstrate that random projections also provide a stable embedding of nonlinear manifolds (see R. G. Baraniuk and M. B. Wakin, “Random projections of smooth manifolds,” Foundations of Computational Mathematics, vol. 9, no. 1, pp. 51-77, February 2009): under certain assumptions on the curvature and volume of a K-dimensional manifold ⊂N, a random sensing matrix with
  M  =      O    ⁡          (                        K          ⁢                                          ⁢                      log            ⁡                          (              N              )                                                δ          2                    )      will with high probability provide a δ-stable embedding of (, ).
We will make further use of these connections in the following sections to aid in our analysis of solving inference problems using compressive measurements.
Compressive Samplers
Compressive sensing (CS) theory opens the door for alternative acquisition and sampling systems. In particular, CS allows us to achieve sub-Nyquist sampling rates and new practical sampling techniques or implementations. In this document, we discuss one way in which the discrete CS framework can be extended to the analog domain, and several new acquisition modalities which exploit the theory. Such sampling modalities include random demodulation (see J. N. Laska, S. Kirolos, M. E Duarte, T. Ragheb, R. G. Baraniuk, and Y. Massoud, “Theory and implementation of an analog-to-information conversion using random demodulation,” in Proc. IEEE Int. Symposium on Circuits and Systems (ISCAS), New Orleans, La., May 2007, to appear and S. Kirolos, J. Laska, M. Wakin, M. Duarte, D. Baron, T. Ragheb, Y. Massoud, and R. Baraniuk, “Analog-to-information conversion via random demodulation,” in In Proc. IEEE Dallas Circuits and Systems Workshop (DCAS), 2006), an architecture based on a wideband pseudorandom modulator and a low-rate sampler, random sampling (see A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss, “Improved time bounds for near-optimal sparse Fourier representations,” in Proc. Wavelets XI at SPIE Optics and Photonics, San Diego, Calif., August 2005), an architecture based on pseudo-random non-uniform time samples, and coset sampling (see R. Shenoy, “Nonuniform sampling of signals and applications,” Circuits and Systems, 1994. ISCAS '94., 1994 IEEE International Symposium on, vol. 2, pp. 181-184 vol. 2, May-2 Jun. 1994). Both of these systems can efficiently acquire a large class of compressible signals. Additionally, we discuss sampling from zero-crossings which can result in very low cost and low power acquisition systems, and recovery from samples that are quantized to 1-bit.
Analog Signal Model for CS
CS theory is framed in terms of discrete vectors and dictionaries, but the concepts can also be extended to continuous-time signals. One possible model is that of an analog signal x(t) that is a periodic or a finite-length signal which can be represented by its Fourier series. When the signal is also bandlimited, its samples at the Nyquist rate suffice to represent it. Under these assumptions, the discrete Fourier transform (DFT) coefficients of the regular samples of the signal are the same as the Fourier series coefficients. We refer to such signals as Fourier-sparse or Fourier-compressible if the vector of DFT coefficients is sparse or compressible, respectively. Thus, we can assume and operate on a discretized signal, x, which consists of samples of x(t) at or faster than the Nyquist rate. While this model is used in the systems introduced in this document, similar models may be applied for other acquisition systems.
Random Demodulator
The architecture of the random demodulator is depicted in FIG. 2. The analog input x(t) is mixed with a pseudo-random square pulse of ±1s, called the chipping sequence pc(t), which alternates between values at or faster than the Nyquist rate Nα Hz of the input signal. The mixed signal is integrated over a time period 1/Mα and sampled by the back-end ADC at Mα Hz <<Nα Hz. In practice, data is processed in time blocks of period T, and we define N=NαT as number of elements in the chipping sequence, and M=MαT as number of measurements. In terms of the discretized model, this is equivalent to multiplying the signal x with a random sequence of ±1s and then summing every N/M sequential coefficients. The key observation, is that the modulator and chipping sequence operate at a fast rate, while the back-end ADC operates at a low rate. In hardware it is easier to build a high-rate modulator/chipping sequence combination than a high-rate ADC. In fact, many systems already use components of this front end for binary phase shift keying demodulation, as well as for other conventional communication schemes such as CDMA.
In our simulations with real data, it is possible to sample bandwidths of 6 MHz using 1/10 of the Nyquist rate and recover the signal with blocks as small as N=256. This system can be represented by a banded matrix Φ containing N/M pseudo-random ±1s per row, which operates on x. For example, with N=9 and M=3, such a Φ is expressed as
  Φ  =            [                                                  -              1                                            1                                              -              1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    -              1                                                          -              1                                            1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          1                                1                                              -              1                                          ]        .  There are many ways to design a practical random demodulator system. For instance, the integrate and dump feature can be replaced by any number of analog or digital filters, the chipping sequence can be chosen to have values other than ±1, and the ADC can be non-uniform.
Random Sampling
One of the fundamental requirements in CS is that signals which are sparse in dictionary Ψ should be sampled with projections onto a set of functions Φ that are incoherent with Ψ. It is well known that the Fourier basis is maximally incoherent with the canonical basis; this has been applied to CS time-sparse signals from random subsets of its Fourier coefficients (see E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Info. Theory, vol. 52, no. 2, pp. 489-509, February 2006). Equivalently, random subsets of the identity matrix, i.e., non-uniform random time samples, provide enough information to recover Fourier-sparse signals using CS. For example, with N=9 and M=3, one such resulting matrix Φ is
  Φ  =            [                                    1                                0                                0                                0                                0                                0                                0                                0                                0                                                0                                0                                0                                1                                0                                0                                0                                0                                0                                                0                                0                                0                                0                                0                                1                                0                                0                                0                              ]        .  
Such a system can be implemented in several ways. One implementation may include a high-rate Nyquist sampler that does not transmit or store all of the samples. Alternatively, random sampling can be implemented as a bank of parallel low-rate samplers potentially at different rates and out of phase with each other.
Non-uniform pseudo-random sampling has been studied in other contexts outside of the CS framework. For example, there exist specialized fast algorithms for recovery of extremely large Fourier-sparse signals. The algorithm uses samples obtained from a structured non-uniform scheme based on random seeds, and it provides guarantees similar to those available from standard CS (see A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss, “Improved time bounds for near-optimal sparse Fourier representations,” in Proc. Wavelets XI at SPIE Optics and Photonics, San Diego, Calif., August 2005).
The Democracy of Compressive Sensing
Among the advantages of random measurements is a property commonly referred to as democracy. While it is not usually rigorously defined in the literature, democracy is usually taken to mean that each measurement contributes a similar amount of information about the signal x to the compressed representation y (see A. R. Calderbank and I. Daubechies, “The pros and cons of democracy,” in IEEE Trans. Inform. Theory, vol. 48, no. 6, pp. 1721-1725, June 2002., S. Güntürk, “Harmonic analysis of two problems in signalcompression,” Ph.D. dissertation, Program in Applied and Computation Mathematics, Princeton University, Princeton, N.J., September 2000., and E. Candès, “Integration of sensing and processing,” IMA annual program year workshop, December 2005). Others have described democracy to mean that each measurement is equally important (or unimportant) (see J. Romberg and M. Wakin, “Compressed sensing: A tutorial,” IEEE Stat. Signal Proc. Workshop, August 2007. Despite the fact that democracy is so frequently touted as an advantage of random measurements, it has received little analytical attention in the CS context. Perhaps more surprisingly, the property has not been explicitly exploited in applications until recently (see J. N. Laska, P. Boufounos, M. A. Davenport, and R. G. Baraniuk, “Democracy in action: Quantization, saturation, and compressive sensing,” Preprint, 2009.).
The fact that random measurements are democratic seems intuitive; when using random measurements, each measurement is a randomly weighted sum of a large fraction (or all) of the entries of x, and since the weights are chosen independently at random, no preference is given to any particular entries. More concretely, suppose that the measurements y1, y2, . . . , yM are independent and identically distributed (i.i.d.) according to some distribution fY, as is the case for the Φ considered in this report. Now suppose that we select {tilde over (M)}<M of the yi at random (or according to some procedure that is independent of y). Then clearly, we are left with a length-{tilde over (M)} measurement vector {tilde over (y)} such that each {tilde over (y)}1˜fY. Stated another way, if we set D=M−{tilde over (M)}, then there is no difference between collecting {tilde over (M)} measurements and collecting M measurements and deleting D of them, provided that this deletion is done independently of the actual values of y.
However, following this line of reasoning will ultimately lead to a rather weak definition of democracy. To see this, consider the case where the measurements are deleted by an adversary. By adaptively deleting the entries of y one can change the distribution of {tilde over (y)}. For example, the adversary can delete the D largest elements of y, thereby skewing the distribution of {tilde over (y)}. In many cases, especially if the same matrix Φ will be used repeatedly with different measurements being deleted each time, it would be far better to know that any {tilde over (M)} measurements will be sufficient to reconstruct the signal. This is a significantly stronger requirement.
In order to formally define this stronger notion of democracy, we must first describe the properties that a matrix must satisfy to ensure stable reconstruction. Towards that end, we recall the definition of the restricted isometry property (RIP) for the matrix Φ (see E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, pp. 4203-4215, December 2005).
Definition 1. A matrix Φ satisfies the RIP of order K with constant δε(0, 1) if(1−δ)∥x∥22≦∥Φx∥22≦(1+δ)∥x∥22  (17)holds for all x such that ∥x∥0≦K.
Much is known about matrices that satisfy the RIP, but for our purposes it suffices to note that if we draw a random M×N matrix Φ whose entries φij are i.i.d. sub-Gaussian random variables, then provided thatM=O(K log(N/K)),  (18)we have that with high probability Φ will satisfy the RIP of order K with constant δ (see R. G. Baraniuk, M. A. Davenport, R. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253-263, December 2008, and R. DeVore, G. Petrova, and P. Wojtaszczyk, “Instance-optimality in probability with an l1-minimization decoder,” Appl. Comput. Harmon. Anal., vol. 27, no. 3, pp. 275-288, 2009.).
When it is satisfied, the RIP for a matrix Φ provides a sufficient condition to guarantee successful sparse recovery using a wide variety of algorithms (see E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, pp. 4203-4215, December 2005, E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” in Compte Rendus de l'Academie des Sciences, Paris, Series I, 2008, vol. 346, pp. 589-592, T. Cai, X. Guangwu, and J. Zhang, “On recovery of sparse signals via l1 minimization,”IEEE Trans. Inform. Theory, vol. 55, no. 7, pp. 3388-3397, 2009., M. Davenport and M. Wakin, “Analysis of Orthogonal Matching Pursuit using the restricted isometry property,” Preprint, 2009., D. Needell and R. Vershynin, “Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit,” Found. Comput. Math., vol. 9, no. 3, pp. 317-334, 2009, “Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit,” 2007, to appear in IEEE J. Selected Topics in Signal Processing., D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery from incomplete and inaccurate samples,” Accepted to Appl. Comp. Harmonic Anal., 2008, W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. Inform. Theory, vol. 55, no. 5, pp. 2230-2249, 2009., A. Cohen, W. Dahmen, and R. DeVore, “Instance optimal decoding by thresholding in compressed sensing,” 2008, preprint., T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressive sensing,” Appl. Comp. Harmonic Anal., vol. 27, no. 3, pp. 265-274, November 2009., R. Chartrand and V. Staneva, “Restricted isometry properties and nonconvex compressive sensing,” Inverse Problems, vol. 24, no. 035020, pp. 1-14, 2008.). As an example, the RIP of order 2K (with isometry constant δ<√{square root over (2)}−1) is a sufficient condition to permit l1-minimization (the canonical convex optimization problem for sparse approximation) to exactly recover any K-sparse signal and to approximately recover those that are nearly sparse (see E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” in Compte Rendus de l'Academie des Sciences, Paris, Series I, 2008, vol. 346, pp. 589-592). The same assumption is also a sufficient condition for robust recovery in noise using a modified l1-minimization (see E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” in Compte Rendus de l'Academie des Sciences, Paris, Series I, 2008, vol. 346, pp. 589-592).
The RIP also provides us with a way to quantify our notion of democracy. To do so, we first establish some notation that will prove useful throughout this report. Let Γ⊂{1, 2, . . . , M}. By ΦΓ we mean the |Γ|×M matrix obtained by selecting the rows of Φ indexed by Γ. Alternatively, if Λ⊂{1, 2, . . . , N}, then we use ΦΛ to indicate the M×|Λ| matrix obtained by selecting the columns of Φ indexed by Λ. Following (see J. N. Laska, P. Boufounos, M. A. Davenport, and R. G. Baraniuk, “Democracy in action: Quantization, saturation, and compressive sensing,” Preprint, 2009.), we now formally define democracy as follows.
Definition 2. Let Φ be and M×N matrix, and let {tilde over (M)}≦M be given. We say that Φ is ({tilde over (M)}, K, δ)-democratic if for all Γ such that |Γ|≧{tilde over (M)} the matrix ΦΓ satisfies the RIP of order K with constant δ.
Theorem 2. Let Φ by an M×N matrix with elements Φij drawn according to N(0,1/M) and let {tilde over (M)}≦M, K≦{tilde over (M)}, and δε(0, 1) be given. Define D=M−{tilde over (M)}. If
                              M          =                                                    C                1                            ⁡                              (                                  K                  +                  D                                )                                      ⁢                          log              ⁡                              (                                                      N                    +                    M                                                        K                    +                    D                                                  )                                                    ,                            (        19        )            then with probability exceeding 1−3e−C2M we have that Φ is ({tilde over (M)}, K, δ/(1−δ))-democratic, where C1 is arbitrary and C2=(δ/8)2−log(42e/δ)/C1.
Observe that we require roughly O(D log(N)) additional measurements to ensure that Φ is ({tilde over (M)}, K, δ)-democratic compared to the number of measurements required to simply ensure that Φ satisfies the RIP of order K. This seems intuitive; if we wish to be robust to the loss of any D measurements while retaining the RIP of order K, then we should expect to take at least D additional measurements. This is not unique to the CS framework. For instance, by oversampling, i.e., sampling faster than the minimum required Nyquist rate, uniform sampling systems can also improve robustness with respect to the loss of measurements. Reconstruction can be performed in principle on the remaining non-uniform grid, as long as the remaining samples satisfy the Nyquist range on average.
However, linear reconstruction in such cases is known to be unstable. Furthermore the linear reconstruction kernels are difficult to compute. Under certain conditions stable non-linear reconstruction is possible, although this poses further requirements on the subset set of samples that can be lost and the computation can be expensive. For example, dropping contiguous groups of measurements can be a challenge for the stability of the reconstruction algorithms. Instead, the democratic principle of CS allows dropping of an arbitrary subset D of the measurements without compromising the reconstruction stability, independent of the way these measurements are chosen.
In some applications, this difference may have significant impact. For example, in finite dynamic range quantizers, the measurements saturate when their magnitude exceeds some level. Thus, when uniformly sampling with a low saturation level, if one sample saturates, then the likelihood that any of the neighboring samples will saturate is high, and significant oversampling may be required to ensure any benefit. However, in CS, if many adjacent measurements were to saturate, then for only a slight increase in the number of measurements we can mitigate this kind of error by simply rejecting the saturated measurements; the fact that Φ is democratic ensures that this strategy will be effective.
Theorem 2 further guarantees graceful degradation due to loss of samples. Specifically, the theorem implies that reconstruction from any subset of CS measurements is stable to the loss of a potentially larger number of measurements than anticipated. To see this, suppose that and M×N matrix Φ is (M−D, K,δ)-democratic, but consider the situation where D+{tilde over (D)} measurements are dropped. It is clear from the proof of Theorem 2 that if {tilde over (D)}<K, then the resulting matrix ΦΓ will satisfy the RIP of order K−{tilde over (D)} with constant δ. Thus, from E. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure and Appl. Math., vol. 59, no. 8, pp. 1207-1223, 2006., if we define {tilde over (K)}=(K−{tilde over (D)})/2, then the reconstruction error is then bounded by
                                                                                      x                -                                  x                  ^                                                                    2                    ≤                                    C              3                        ⁢                                                                                                  x                    -                                          x                                              K                        ~                                                                                                              1                                                              K                  ~                                                                    ,                            (        20        )            where x{tilde over (K)} denotes the best {tilde over (K)}-term approximation of x and C3 is an absolute constant depending on Φ that can be bounded using the constants derived in Theorem 2. Thus, if {tilde over (D)} is small then the additional error caused by dropping too many measurements will also be relatively small. To our knowledge, there is simply no analog to this kind of graceful degradation result for uniform sampling with linear reconstruction. When the number of dropped samples exceeds D, there is are no guarantees as to the accuracy of the reconstruction.