1.1 Field of the Invention
The present invention relates to methods and apparatus for processing signals to remove an undesired component or components while preserving a desired component or components. The invention further relates to methods that exploit the processed signal measurements for the purpose of performing a further processing step including but not limited to detection, classification, estimation, reconstruction, or other information exploitation. The invention is applicable to all types of “signals” and data, including but not limited to signals, images, video and other higher-dimensional data.
1.2 Brief Description of the Related Art
Interference cancellation is a widely explored topic with extensive applications in radar signal processing and both wired and wireless communications. Very often the characteristics of the interference are known and can be removed with an analog or digital linear time-invariant filter. For example, in wired communication such as DSL modems, analog or digital filters are used to remove the voice signal, which is of known bandwidth, from the input to the DSL modem. Notch filters are also used to remove the 60 Hz hum due to the power line interference.
In many applications, the characteristics of the signal are not known at the system design time, but are estimated as the system is operating. In these applications the application adapts the filters to remove the interference as necessary. For example in radar signal processing, interference comes from different spatial directions. The radar system detects this interference and modifies the array response pattern such that the array is “blind” to that particular direction, i.e., places a null in the array response function at that angle (see H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part IV: Optimum Array Processing, John Wiley & Sons, 2002). In most of those applications, the interference cancellation is adaptive and is performed using digital processing.
In other applications the system has some control of the source of the interferer. For example, in multiuser wireless communication each user is a source of interference to the other users of the system. The system has control on each of the users through the communication protocol. In this case the system design goal is to reduce interference as much as possible.
In contrast, the present invention emphasizes the value of simple, general-purpose hardware that captures and processes all signals of interest together with the interference in a small set of compressive samples. The only known efforts towards interference cancellation with compressive measurements are noted in US Patent Application Publication 20080228446—Method and Apparatus for Signal Detection, Classification and Estimation from Compressive Measurements. This publication describes a technique for interference cancellation that is similar in spirit to the classical techniques described above, in which the signal processing system obtains an estimate of the interference and then removes this contribution from the measurements. In contrast, the present invention considers a very different form of interference cancellation in that it does not obtain a complete estimate of the interfering signal and then subtract out its contribution, but rather directly removes the interference while operating entirely in the compressive domain.
1.3 Compressive Sensing Background
1.3.1 Compressive Sensing and the Restricted Isometry Property
In the standard CS framework, a signal xε is acquired via the linear measurementsy=Φx,  (1)where Φ is an M×N matrix representing the sampling system and yεis the vector of measurements acquired. In practice, this signal might actually be a continuous signal or image, and the measurements need not be linear. For instance, the measurements could be adaptive or be processed by a nonlinear system such as a quantizer. However, for the sake of simplicity, the CS framework is described for linear, real-valued measurements. The CS theory permits the acquisition of significantly fewer samples than N, as long as the signal x is sparse or compressible in some basis (see E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489-509, 2006; E. J. Candès, “Compressive sampling,” in Proc. International Congress of Mathematicians, vol. 3, pp. 1433-1452, Madrid, Spain, 2006; D. L. Donoho, “Compressed sensing,” IEEE Trans. Info. Theory, vol. 52, no. 4, pp. 1289-1306, September 2006; and E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, pp. 4203-4215, December 2005).
To understand precisely how many measurements are required to enable the recovery of a signal x, the properties of Φ that guarantee satisfactory performance of the sensing system are examined Candès and Tao introduced the restricted isometry property (RIP) of a matrix Φ and established its important role in CS (see E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, pp. 4203-4215, December 2005.) First ΣK is defined to be the set of all K-sparse signals in , i.e.,ΣK={xε:∥x∥0≦K}, where ∥•∥0 denotes the l0 quasi-norm, which simply counts the number of non-zero entries of a vector. It can be said that a matrix Φ satisfies the RIP of order K if there exists a constant δε(0,1), such that(1−δ)∥x∥22≦∥Φx∥22≦(1+δ)∥x∥22  (2)holds for all xεΣK. In other words, Φ acts as an approximate isometry on the set of vectors that are K-sparse.
It is clear that if one wishes to be able to recover all K-sparse signals x from the measurements y, then a necessary condition on Φ is that Φx1≠Φx2 for any pair x1, x2εΣK with x1≠x2. Equivalently, we require ∥Φ(x1−x2)∥22>0, which is guaranteed if Φ satisfies the RIP of order 2K with constant δ<1. Furthermore, the RIP also ensures that a variety of practical algorithms can successfully recover any compressible signal from noisy measurements. The following theorem, a slight modification of Theorem 1.2 from E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” in Compte Rendus de l'Academie des Sciences, Paris, Series I, vol. 346, pp. 589-592, 2008, makes this precise by bounding the recovery error of x with respect to the sampling noise and with respect to the l1-distance from x to its best K-term approximation denoted xK:
      x    K    =      arg    ⁢                  ⁢                  min                              x            ′                    ∈                      ∑            K                              ⁢                                                              x              -                              x                ′                                                          1                .            Theorem 1. Suppose that Φ satisfies the RIP of order 2K with isometry constant satisfying δ<√{square root over (2)}−1. Given measurements of the form y=Φx+e, where ∥e∥2≦ε, the solution to
                                          x            ^                    =                                    arg              ⁢                                                          ⁢                                                min                                                            x                      ′                                        ∈                                                                                  ⁢                                                                N                                                                      ⁢                                                                                                                          x                        ′                                                                                    1                                    ⁢                                                                          ⁢                  subject                  ⁢                                                                          ⁢                  to                  ⁢                                                                          ⁢                                                                                                                                    Φ                          ⁢                                                                                                          ⁢                                                      x                            ′                                                                          -                        y                                                                                    2                                                                        ≤            ε                          ⁢                                  ⁢        obeys        ⁢                                  ⁢                                                                                                                  x                    ^                                    -                  x                                                            2                        ≤                                                            C                  0                                ⁢                ε                            +                                                C                  1                                ⁢                                                                                                                          x                        -                                                  x                          K                                                                                                            1                                                        K                                                                                ,                                          ⁢          where                ⁢                                  ⁢                                            C              0                        =                          4              ⁢                                                                    1                    +                    δ                                                                    1                  -                                                            (                                              1                        +                                                  2                                                                    )                                        ⁢                    δ                                                                                ,                                          ⁢                                    C              1                        =                          2              ⁢                                                                    1                    ⁢                                          (                                              1                        -                                                  2                                                                    )                                        ⁢                    δ                                                        1                    -                                                                  (                                                  1                          +                                                      2                                                                          )                                            ⁢                      δ                                                                      .                                                                        (        3        )            
Note that in practice one may wish to acquire signals that are sparse or compressible with respect to a certain sparsity basis or sparsity dictionary Ψ, i.e., x=Ψα where Ψ is represented as an N×N matrix (for the case of a basis) and αεΣK. In this case it would require instead that ΦΨ satisfy the RIP, and the performance guarantee would be on a ∥{circumflex over (α)}−α∥2.
While l1-minimization techniques are a powerful method for CS signal recovery, there also exist a variety of greedy algorithms that are commonly used in practice and for which performance guarantees similar to that of Theorem 1 can be established. In particular, a greedy algorithm called CoSaMP was recently shown to satisfy similar performance guarantees under slightly stronger assumptions on the RIP constants (see D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301-321, May 2009). Furthermore, it is often possible to demonstrate that an algorithm will be successful without requiring that Φ satisfy the RIP, but instead requiring that Φ satisfy some other property or properties. Thus, while the present discussion focuses primarily on the RIP in a preferred embodiment for ease of presentation, this should not be construed as to limit the scope of the invention.
Before discussing how one can actually obtain a matrix Φ that satisfies the RIP, observe that one can restate the RIP in a slightly more general form. If δε(0,1) and U,V⊂, we will say that a mapping Φ is a δ-stable embedding of (U,V) if(1−δ)∥u−v∥22≦∥Φu−Φv∥22≦(1+δ)∥u−v∥22  (4)for all uεU and vεV. A mapping satisfying this property is also commonly called bi-Lipschitz. Observe that for a matrix Φ, satisfying the RIP of order 2K is equivalent to being a δ-stable embedding of (ΣK,ΣK) or of (Σ2K, {0}).1 Furthermore, if the matrix ΦΨ satisfies the RIP of order 2K then Φ is a δ-stable embedding of (Ψ(ΣK), Ψ(ΣK)) or ((Ψ(Σ2K), {0}), where Ψ(ΣK)={x=Ψα: αεΣK}. 1In general, if Φ is a δ-stable embedding of (U,V), this is equivalent to it being a δ-stable embedding of (Ũ,{0}) where Ũ={u−v: uεU,vεV}. This formulation can sometimes be more convenient.1.3.2 Random Matrix Constructions for Stable Embeddings
We now turn to the more general question of how to construct linear mappings Φ that satisfy (4) for particular sets U and V. While it is possible to obtain deterministic constructions of such operators, without loss of generality and for a simple proof of concept, it is often useful to consider random matrix constructions. The random matrices will be constructed as follows: given M and N, generate random M×N matrices Φ by choosing the entries φi,j as independent and identically distributed (i.i.d.) random variables. Two conditions on the random distribution are considered. First, for simplicity (in order to ensure a preservation of norms up to constants 1±δ) it will be supposed that the distribution will yield a matrix that is norm-preserving, i.e.,
                                        ⁢                      (                          ϕ              ij              2                        )                          =                              1            M                    .                                    (        5        )            Note that in general, one could replace this with any variance to yield measurements that embed the sets with constants M(φij2)(1±δ). Second, although there are many other possible choices, it is supposed that the distribution is a sub-Gaussian distribution, meaning that there exists a constant C>0 such that(eφijt)≦eC2t2/2  (6)for all tε. This says that the moment-generating function of the distribution is dominated by that of a Gaussian distribution, which is also equivalent to requiring that tails of our distribution decay at least as fast as the tails of a Gaussian distribution. Examples of sub-Gaussian distributions include the Gaussian distribution, the Rademacher distribution, and the uniform distribution. In general, any distribution with bounded support is sub-Gaussian (see V. V. Buldygin and Yu. V. Kozachenko, Metric Characterization of Random Variables and Random Processes, American Mathematical Society, Providence, R.I., 2000).
The key property of sub-Gaussian random variables that will be of use in the discussion is that for any xε, the random variable ∥Φx∥22 is strongly concentrated about its expected value; that is, there exists a constant c>0 that depends only on the constant C in (6) such thatPr(|∥Φx∥22−∥x∥22|≧δ∥x∥22)≦2e−cMδ2,  (7)where the probability is taken over all M×N matrices Φ (see D. Achlioptas, “Database-friendly random projections,” in Proc. Symp. Principles of Database Systems, 2001).
A number of results are now presented that will be use extensively in the sequel to ensure the stability of the compressive filtering and interference rejection method and apparatus of the present invention. While we state the results for random matrices, in practice a variety of pseudo-random and deterministic matrices will provide similar results.
Start by considering the simple case of wanting to have a δ-stable embedding of (U,V) where U={ui}i=1|U| and V={vj}j=1|V| are finite sets of points in . In the case where U=V, this is essentially the Johnson-Lindenstrauss (JL) lemma (see W. B Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” in Proc. Conf in Modern Analysis and Probability, pp. 189-206, 1984; S. Dasgupta and A. Gupta, “An elementary proof of the Johnson-Lindenstrauss lemma,” Tech. Rep. TR-99-006, Berkeley, Calif., 1999; and D. Achlioptas, “Database-friendly random projections,” in Proc. Symp. Principles of Database Systems, 2001).
Lemma 1. Let U and V be sets of points in , Fix δ,βε(0, 1). Let Φ be an M×N random matrix with i.i.d. entries chosen from a distribution that satisfies (7). If
                    M        ≥                                            ln              ⁡                              (                                                                          U                                                        ⁢                                                          V                                                                      )                                      +                          ln              ⁡                              (                                  2                  /                  β                                )                                                          c            ⁢                                                  ⁢                          δ              2                                                          (        8        )            then with probability exceeding 1−β, Φ is a δ-stable embedding of (U,V).
Now consider the case where U=(ΨJ) and V={0}, where ΨJ is an orthonormal basis for a K-dimensional subspace of , and (•) denotes the range, or column span, of an operator. We wish to obtain a Φ that preserves the norm of any vector xε(ΨJ). At first glance, this might seem very different than the setting for Lemma 8, since the former involves an uncountable point set, and the latter deals only with embedding a finite number of points. However, the dimension K bounds the complexity of this space and thus it can be characterized in terms of a finite number of points. The following lemma follows from Lemma 5.1 in R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253-263, December 2008.
Lemma 2. Suppose that ΨJ is an orthonormal basis for a K-dimensional subspace of . Fix δ,βε(0, 1). Let Φ be an M×N random matrix with i.i.d. entries chosen from a distribution that satisfies (7). If
                    M        ≥                  2          ⁢                                                    K                ⁢                                                                  ⁢                                  ln                  ⁡                                      (                                          42                      /                      δ                                        )                                                              +                              ln                ⁡                                  (                                      2                    /                    β                                    )                                                                    c              ⁢                                                          ⁢                              δ                2                                                                        (        9        )            then with probability exceeding 1−β, Φ is a δ-stable embedding of ((ΨJ), {0}).
Now observe that one can extend this result beyond a single K-dimensional subspace to all possible subspaces of K-sparse signals, i.e., (see R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253-263, December 2008).
Lemma 3. Let Ψ be an orthonormal basis and fix δ,βε(0, 1). Let Φ be an M×N random matrix with i.i.d. entries chosen from a distribution that satisfies (7). If
                    M        ≥                  2          ⁢                                                    K                ⁢                                                                  ⁢                                  ln                  ⁡                                      (                                          42                      ⁢                                              eN                        /                        δ                                            ⁢                                                                                          ⁢                      K                                        )                                                              +                              ln                ⁡                                  (                                      2                    /                    β                                    )                                                                    c              ⁢                                                          ⁢                              δ                2                                                                        (        10        )            with e denoting the base of the natural logarithm, then with probability exceeding 1−β, Φ is a δ-stable embedding of (Ψ(ΣK), {0}).
A similar technique has recently been used to demonstrate that random projections also provide a stable embedding of nonlinear manifolds (see R. G. Baraniuk and M. B. Wakin, “Random projections of smooth manifolds,” Foundations of Computational Mathematics, vol. 9, no. 1, pp. 51-77, February 2009): under certain assumptions on the curvature and volume of a K-dimensional manifold ⊂, a random sensing matrix with
  M  =      O    ⁡          (                        K          ⁢                                          ⁢                      log            ⁡                          (              N              )                                                δ          2                    )      will with high probability provide a δ-stable embedding of .
Further use will be made of these connections in the following sections in order to filter out unwanted signal components directly from the compressive measurements.
1.3.3 Compressive Samplers
Compressive sensing (CS) theory opens the door for alternative acquisition and sampling systems. In particular, CS allows one to achieve sub-Nyquist sampling rates and to design new practical sampling techniques or implementations. In this section, one way in which the discrete CS framework can be extended to the analog domain is discussed along with several new acquisition modalities which exploit the theory. Such sampling modalities include random demodulation (see J. N. Laska, S. Kirolos, M. F. Duarte, T. Ragheb, R. G. Baraniuk, and Y. Massoud, “Theory and implementation of an analog-to-information conversion using random demodulation,” in Proc. IEEE Int. Symposium on Circuits and Systems (ISCAS), New Orleans, La., May 2007 and S. Kirolos, J. Laska, M. Wakin, M. Duarte, D. Baron, T. Ragheb, Y. Massoud, and R. Baraniuk, “Analog-to-information conversion via random demodulation,” in In Proc. IEEE Dallas Circuits and Systems Workshop (DCAS), 2006), an architecture based on a wideband pseudorandom modulator and a low-rate sampler, random sampling (see A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss, “Improved time bounds for near-optimal sparse Fourier representations,” in Proc. Wavelets XI at SPIE Optics and Photonics, San Diego, Calif., August 2005), an architecture based on pseudo-random non-uniform time samples, and coset sampling (see R. Shenoy, “Nonuniform sampling of signals and applications,” IEEE International Symposium on Circuits and Systems (ISCAS), vol. 2, pp. 181-184, 1994). Both of these systems can efficiently acquire a large class of compressible signals. Additionally, sampling from zero-crossings which can result in very low cost and low power acquisition systems, and recovery from samples that are quantized to 1-bit are discussed.
CS theory is framed in terms of discrete vectors and dictionaries, but the concepts can also be extended to continuous-time signals. The assumed model for this section is that of an analog signal x(t) that is a periodic or a finite-length signal which can be represented by its Fourier series. When the signal is also bandlimited, its samples at the Nyquist rate suffice to represent it. Under these assumptions, the discrete Fourier transform (DFT) coefficients of the regular samples of the signal are the same as the Fourier series coefficients. Such signals are referred to as Fourier-sparse or Fourier-compressible if the vector of DFT coefficients is sparse or compressible, respectively. Thus, one can assume and operate on a discretized signal, x, which consists of samples of x(t) at or faster than the Nyquist rate. While this model is used in the systems introduced in this document, similar models may be applied for other acquisition systems.
The architecture of a random demodulator is depicted in FIG. 1. The analog input x(t) is mixed with a pseudo-random square pulse of ±1 s, called the chipping sequence pc(t), which alternates between values at or faster than the Nyquist rate NαHz of the input signal. The mixed signal is integrated over a time period 1/Mα and sampled by the back-end analog-to-digital converter (ADC) at MαHzNαHz. In practice, data is processed in time blocks of period T, and N=NαT is defined as number of elements in the chipping sequence, and M=MαT as number of measurements. In terms of the discretized model, this is equivalent to multiplying the signal x with a random sequence of ±1 s and then summing every N/M sequential coefficients. The key observation is that the modulator and chipping sequence operate at a fast rate, while the back-end ADC operates at a low rate. In hardware it is easier to build a high-rate modulator/chipping sequence combination than a high-rate ADC (see J. N. Laska, S. Kirolos, M. F. Duarte, T. Ragheb, R. G. Baraniuk, and Y. Massoud, “Theory and implementation of an analog-to-information conversion using random demodulation,” in Proc. IEEE Int. Symposium on Circuits and Systems (ISCAS), New Orleans, La., May 2007). In fact, many systems already use components of this front end for binary phase shift keying demodulation, as well as for other conventional communication schemes such as CDMA.
In simulations with real data, it is possible to sample bandwidths of 6 MHz using 1/10 of the Nyquist rate and recover the signal with blocks as small as N=256. This system can be represented by a banded matrix Φ containing N/M pseudo-random ±1 s per row, which operates on x. For example, with N=9 and M=3, such a Φ is expressed as
  Φ  =            [                                                  -              1                                            1                                              -              1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    -              1                                                          -              1                                            1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          1                                1                                              -              1                                          ]        .  
A detailed analysis has shown that the random demodulator can recover K-Fourier-sparse signals withM≧CK log(N/K+1)measurements, where C≈1.7 (see J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, “Beyond Nyquist: Efficient sampling of sparse, bandlimited signals,” to appear in IEEE Transactions on Information Theory, 2010).
One of the fundamental requirements in CS is that signals which are sparse in a dictionary Ψ should be sampled with projections onto a set of functions Φ that are incoherent with Ψ. It is well known that the Fourier basis is maximally incoherent with the canonical basis; this has been applied to CS time-sparse signals from random subsets of its Fourier coefficients (see E. J. Candès and T. Tao, “Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Info Theory, vol. 52, no. 12, pp. 5406-5425, 2006). Equivalently, random subsets of the identity matrix, i.e., non-uniform random time samples, provide enough information to recover Fourier-sparse signals using CS. For example, with N=9 and M=3, one such resulting matrix Φ is
  Φ  =            [                                    1                                0                                0                                0                                0                                0                                0                                0                                0                                                0                                0                                0                                1                                0                                0                                0                                0                                0                                                0                                0                                0                                0                                0                                1                                0                                0                                0                              ]        .  
Such a system can be implemented in several ways. One implementation may include a high-rate Nyquist sampler that does not transmit or store all of the samples. Alternatively, random sampling can be implemented as a bank of parallel low-rate samplers potentially at different rates and out of phase with each other.
Non-uniform pseudo-random sampling has been studied in other contexts outside of the CS framework. For example, there exist specialized fast algorithms for recovery of extremely large Fourier-sparse signals. The algorithm uses samples obtained from a structured non-uniform scheme based on random seeds, and it provides guarantees similar to those available from standard CS (see A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss, “Improved time bounds for near-optimal sparse Fourier representations,” in Proc. Wavelets XI at SPIE Optics and Photonics, San Diego, Calif., August 2005 and A. C. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. J. Strauss, “Near-Optimal Sparse Fourier Representations via Sampling,” in Proc. ACM Symposium on Theory of Computing, Montreal, Canada, May 2002).