1.1 Field of the Invention
The present invention relates to methods and apparatus for the acquisition of time-varying signals such as video sequences using compressive measurements and a dynamical system model for the evolution of the data. The invention further relates to methods that exploit the signal measurements and the dynamical system model for the purpose of performing a further processing step including but not limited to detection, classification, estimation, reconstruction, or other information exploitation. The invention is applicable to all types of signals and data, including but not limited to signals, images, video and other higher-dimensional data. In particular, the invention is applicable to highly correlated data that exhibits subspace structure such as hyper-spectral data and reflectance fields.
1.2 Brief Description of the Related Art
1.2.1 Compressive Sensing
Consider a signal yεN, which is K-sparse in a basis Ψ, that is, sεN, defined as s=ΨTy, has at most K non-zero components. The signal y could be of any dimension, i.e., a one-dimensional (1D) time signal, a 2D image, a 3D video sequence, a 3D hyperspectral data cube, a 4D hyperspectral video sequence, and so on. Compressive sensing (see E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information”, in IEEE Transactions on information theory, vol. 52 (2006) 489-509; D. Donoho, “Compressed sensing,” in IEEE Transactions on Information Theory, vol. 52 (2006) 1289-1306) deals with the recovery of y from dimensionality reduced linear measurements of the form z=Φy=ΦΨs, where ΦεM×N is the measurement matrix. For M<N (corresponding to dimensionality reduction), estimating y from the measurements z is an ill-conditioned problem. By exploiting the sparsity of s, the CS theory demonstrates that the signal y can be recovered exactly from M=O(K log(N/K)) measurements provided the matrix ΦΨ satisfies the so called restricted isometry property (RIP) (see R. Baraniuk, M. Davenport, R. DeVore and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” in Constructive Approximation, vol. 28 (2008) 253-263).
In practical scenarios, where there is noise in the signal y or the measurements z, the signal s (or equivalently, y) can be recovered from z by solving a convex problem of the formmin∥s∥1 subject to ∥z−ΦΨs∥≦ε  (1)with ε a bound on the measurement noise. It can be shown that the solution to (1) is with high probability the K-sparse solution that we seek. The theoretical guarantees of CS have been extended to compressible signals (see J. Haupt, and R. Nowak, “Signal reconstruction from noisy random projections,” IEEE Transactions on Information Theory, vol. 52 (2006) 4036-4048). In a compressible signal, the sorted coefficients decay rapidly according to a power-law.
There exist a wide range of algorithms that solve (1) under various approximations or reformulations (see E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information”, in IEEE Transactions on information theory, vol. 52 (2006) 489-509; E. van den Berg, and M. P. Friedlander, “Probing the pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31 (2008) 890-912). It is also possible to solve (1) efficiently using greedy techniques such as Orthogonal Matching Pursuit (see Y. C. Pati, R. Rezaiifar and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Asilomar Conference on Signals, Systems and Computers, Volume 1. (1993) 40-44) and CoSAMP (see D. Needell, and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” in Applied and Computational Harmonic Analysis, vol. 26 (2009) 301-321). In particular, CoSAMP is a lucrative alternative to convex optimization methods given its strong convergence properties and low computational complexity. It is also easy to impose structural constraints such as block sparsity into CoSAMP giving variants such as model-based CoSAMP (see R. G. Baraniuk, V. Cevher, M. F. Duarte and C. Hegde, “Model-based compressive sensing,” CoRR vol. abs/0808.3572 (2008)).
1.2.2 Video Compressive Sensing
In video CS, we are interested in acquiring and recovering a video sequence (without loss of generality with two spatial dimensions and one time dimension) of a scene that has dynamic elements. Existing methods for video CS work under the assumption of the availability of multiple measurements at each time instant. To date, such measurements have been obtained using a snapshot imager (see A. Wagadarikar, R. John, R. Willett and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” in Applied Optics, vol. 47 (2008) 44-51) or by stacking consecutive measurements from a single pixel camera (SPC) (see M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, “Single-pixel imaging via compressive sampling,” in IEEE Signal Processing Magazine, vol. 25 (2008) 83-91). Given such a sequence of compressive measurements, reconstruction of the video can be achieved in multiple ways. Wakin et al. (see M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, R. Baraniuk, “Compressive imaging for video representation and coding, in Picture Coding Symposium, (2006)) use a 3D wavelet transform as the sparsifying basis Ψ for recovering videos from snapshots of compressive measurements. Park and Wakin (see J. Park and M. Wakin, “A multiscale framework for compressive sensing of video,” in Picure Coding Symposium, (2009)) use a coarse-to-fine estimation framework wherein the video, reconstructed at a coarse level, is used to estimate motion vectors that are subsequently used to design dictionaries for reconstruction at a finer level. Vaswani (see N. Vaswani, “Kalman filtered compressed sensing,” in IEEE International Conference on Image Processing, (2008)) and Vaswani and Lu (see N. Vaswani and W. Lu, “Modified-CS: Modifying compressive sensing for problems with partially known support,” in Intl. Symposium on Information Theory, (2009)) propose a sequential framework that exploits the similarity of support and the value the signal takes in this support between adjacent frames of a video. A frame of video is reconstructed using a linear inversion over the support at the previous time instant, and a small-scale CS recovery over the residue. All of these algorithms require a large number of measurements at each time instant and in most cases, the number of measurements is proportional to the sparsity of an individual frame. This could potentially be a limiting factor in many applications (where sensing is costly).
Video CS is related to the background subtraction problem, where the idea is to estimate only the dynamic components of a scene. Cevher et al. (see V. Cevher, A. Sankaranarayanan, M. Duarte, D. Reddy, R. Baraniuk and R. Chellappa, “Compressive sensing for background subtraction,” in European Conference on Computer Vision, Springer (2008) 12-18) and Zheng and Jacobs (see J. Zheng and E. Jacobs, “Video compressive sensing using spatial domain sparsity”, in Optical Engineering, vol. 48 (2009) 087006) model a video as a static scene with canonically sparse innovations. Veeraraghavan et al. (see A. Veeraraghavan, D. Reddy, and R. Raskar, “Coded strobing photography: Compressive sensing of high-speed periodic events,” in IEEE Trans. on Pattern Analysis and Machine Intelligence ((to appear), URL: http://www.cfar.umd.edu/users/vashok)) propose a compressive sensing framework of periodic scenes using coded strobing techniques.
1.2.3 Dynamic Textures and Linear Dynamical Systems
Linear dynamical systems (LDS) are a class of parametric models for time-series data of any dimension. A wide variety of spatio-temporal data have often been modeled as realizations of LDS. In particular, they have been used to model, synthesize and classify dynamic textures (see G. Doretto, A. Chiuso, Y. Wu, and S. Soatto, “Dynamic textures,” in International Journal of Computer Vision, vol. 51 (2003) 91-109), traffic scenes (see A. B. Chan and N. Vasconcelos, “Probabilistic kernels for the classification of auto-regressive visual processes,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2005) 846-851), and human activities (see A. Veeraraghavan, A. K. Roy-Chowdhury, and R. Chellappa, “Matching shape sequences in video with applications in human movement analysis,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27 (2005) 1896-19092005, P. Turaga, A. Veeraraghavan, and R. Chellappa, “Unsupervised view and rate invariant clustering of video sequences,” Computer Vision and Image Understanding vol. 113 (2009) 353-371). Let {yt, t=0, . . . , T} be a sequence of frames/observations indexed by time t. The LDS model parameterizes the evolution of yt as follows:yt=Cxt+wt wt˜N(0,R), RεN×N  (2)xt+1=Axt+vt vt˜N(0,Q), Qεd×d  (3)where xtεd is the hidden state vector, Aεd×d the transition matrix, and CεN×d the observation matrix.
Given the observations {yt}, the truncated SVD of the matrix [y]1:T=[y1, y2, . . . , yT] can be used to recover both C and A. In particular, an estimate of the observation matrix C is given as Ĉ=U, where [y]1:T≈UΣVT is the rank-d−approximation/truncated SVD. Note that the choice of C is unique only up to a d×d linear transformation. That is, given [y]1:T, we can define Ĉ=UL, where L is an invertible d×d matrix. This represents our choice of coordinates in the subspace defined by the columns of C.