1. Field of the Invention
The present invention relates to a system for computing sparse representations of data, i.e., where the data can be fully represented in terms of a small number of non-zero code elements, and for reconstructing compressively sensed images.
2. Brief Description of the Related Art
Natural signals can be well-approximated by a small subset of elements from an over complete dictionary. The process of choosing a good subset of dictionary elements along with the corresponding coefficients to represent a signal is known as sparse approximation. Sparse approximation is a difficult non-convex optimization problem that is at the center of much research in mathematics and signal processing. Existing sparse approximation algorithms suffer from one or more of the following drawbacks: 1) they are not implementable in parallel computational architectures; 2) they have difficulty producing exactly sparse coefficients in finite time; 3) they produce coefficients for time-varying stimuli that contain inefficient fluctuations, making the stimulus content more difficult to interpret; or 4) they only use a heuristic approximation to minimizing a desired objective function.
Given an N-dimensional stimulus sε, we seek a representation in terms of a dictionary D composed of M vectors {φm} that span the space . Define the lP norm of the vector x to be ∥x∥p=(Σ|xm|p)1/p and the inner product (projection) between x and y to be <x, y>=Σxmym. Without loss of generality, assume the dictionary vectors are unit-norm, ∥φm∥2=1. When the dictionary is overcomplete (M>N), there are an infinite number of ways to choose coefficients {am} such that
  s  =            ∑              m        =        1            M        ⁢                  a        m            ⁢                        ϕ          m                .            In optimal sparse approximation, we seek the coefficients having the fewest number of non-zero entries by solving the minimization problem
                                                        min              a                        ⁢                                                                              a                                                  0                            ⁢                                                          ⁢              subject              ⁢                                                          ⁢              to              ⁢                                                          ⁢              s                                =                                    ∑                              m                =                1                            M                        ⁢                                          a                m                            ⁢                              ϕ                m                                                    ,                            (        1        )            where the l0 “norm” denotes the number of non-zero elements of a=[a1, a2, . . . , aM]. While this clearly is not a norm in the mathematical sense, the term here will be used as it is prevalent in the literature. Unfortunately, this combinatorial optimization problem is NP-hard.
In the signal processing community, two approaches are typically used on digital computers to find acceptable suboptimal solutions to this intractable problem. The first general approach substitutes an alternate sparsity measure to convexify the l0 norm. One well-known example is Basis Pursuit (BP) (Chen et al., 2001), which replaces the l0 norm with the l1 norm
                                          min            a                    ⁢                                                                    a                                            1                        ⁢                                                  ⁢            subject            ⁢                                                  ⁢            to            ⁢                                                  ⁢            s                          =                              ∑                          m              =              1                        M                    ⁢                                    a              m                        ⁢                                          ϕ                m                            .                                                          (        2        )            
Despite this substitution, BP has the same solution as the optimal sparse approximation problem (Donoho and Elad, 2003) if the signal is sparse compared to the nearest pair of dictionary elements
                              (                                    e              .              g              .                        ,                                                  ⁢                                                                              a                                                  0                            <                                                min                                      m                    ≠                    n                                                  ⁢                                                      1                    2                                    ⁡                                      [                                          1                      +                                              1                        /                                                  〈                                                                                    ϕ                              m                                                        ,                                                          ϕ                              n                                                                                〉                                                                                      ]                                                                                )                .                                        In practice, the presence of signal noise often leads to using a modified approach called Basis Pursuit De-Noising (BPDN) (Chen et al., 2001) that makes a tradeoff between reconstruction mean-squared error (MSE) and sparsity in an unconstrained optimization problem:
                                          min            a                    ⁢                      (                                                                                                  s                    -                                                                  ∑                                                  m                          =                          1                                                M                                            ⁢                                                                        a                          m                                                ⁢                                                  ϕ                          m                                                                                                                                      2                2                            +                              λ                ⁢                                                                          a                                                        1                                                      )                          ,                            (        3        )            where λ is a tradeoff parameter. BPDN provides the l1-sparsest approximation for a given reconstruction quality. There are many algorithms that can be used to solve the BPDN optimization problem, with interior point-type methods being the most common choice.
The second general approach employed by signal processing researchers uses iterative greedy algorithms to constructively build up a signal representation (Tropp, 2004). The canonical example of a greedy algorithm is known in the signal processing community as Matching Pursuit (MP) (Mallat and Zhang, 1993). The MP algorithm is initialized with a residual r0=s. At the kth iteration, MP finds the index of the single dictionary element best approximating the current residual signal, θk=arg maxmrk−1, φm. The coefficient dk=rk−1, φθk and index θk are recorded as part of the reconstruction, and the residual is updated, rk=rk−1−φθkdk. After K iterations, the signal approximation using MP is given by
      s    ^    =            ∑              k        =        1            K        ⁢                  ϕ                  θ          k                    ⁢                        d          k                .            Though they may not be optimal in general, greedy algorithms often efficiently find good sparse signal representations in practice.
Recent research has found compelling evidence that the properties of V1 population responses to natural stimuli may be the result of a sparse approximation. For example, it has been shown that V1 receptive fields are consistent with optimizing the coefficient sparsity when encoding natural images. Additionally, V1 recordings in response to natural scene stimuli show activity levels (corresponding to the coefficients {am}) becoming sparser as neighboring units are also stimulated. These populations are typically very overcomplete, allowing great flexibility in the representation of a stimulus. Using this flexibility to pursue sparse codes might offer many advantages to sensory neural systems, including enhancing the performance of subsequent processing stages, increasing the storage capacity in associative memories, and increasing the energy efficiency of the system.
However, existing sparse approximation algorithms do not have implementations that correspond both naturally and efficiently to parallel computational architectures such as those seen in neural populations or in analog hardware. For convex relaxation approaches, a network implementation of BPDN can be constructed, following the common practice of using dynamical systems to implement direct gradient descent optimization. Unfortunately, this implementation has two major drawbacks. First, it lacks a natural mathematical mechanism to make small coefficients identically zero. While the true BPDN solution would have many coefficients that are exactly zero, direct gradient methods to find an approximate solution in finite time produce coefficients that merely have small magnitudes. Ad hoc thresholding can be used on the results to produce zero-valued coefficients, but such methods lack theoretical justification and can be difficult to use without oracle knowledge of the best threshold value. Second, this implementation requires persistent (two-way) signaling between all units with overlapping receptive fields (e.g., even a node with a nearly zero value would have to continue sending inhibition signals to all similar nodes). In greedy algorithm approaches, spiking neural circuits can be constructed to implement MP. Unfortunately, this type of circuit implementation relies on a temporal code that requires tightly coupled and precise elements to both encode and decode.
Beyond implementation considerations, existing sparse approximation algorithms also do not consider the time-varying signals common in nature. A time-varying input signal s(t) is represented with a set of time-varying coefficients {am(t)}. While temporal coefficient changes are necessary to encode stimulus changes, the most useful encoding would use coefficient changes that reflect the character of the stimulus. In particular, sparse coefficients should have smooth temporal variations in response to smooth changes in the image. However, most sparse approximation schemes have a single goal: select the smallest number of coefficients to represent a fixed signal. This single-minded approach can produce coefficient sequences for time-varying stimuli that are erratic, with drastic changes not only in the values of the coefficients but also in the selection of which coefficients are used. These erratic temporal codes are inefficient because they introduce uncertainty about which coefficients are coding the most significant stimulus changes, thereby complicating the process of understanding the changing stimulus content.
There are several sparse approximation methods that do not fit into the two primary approaches of pure greedy algorithms or convex relaxation. Methods such as Sparse Bayesian Learning, FOCUSS, modifications of greedy algorithms that select multiple coefficients on each iteration and MP extensions that perform an orthogonalization at each step involve computations that would be very difficult to implement in a parallel, distributed architecture. For FOCUSS, there also exists a dynamical system implementation that uses parallel computation to implement a competition strategy among the nodes (strong nodes are encouraged to grow while weak nodes are penalized), however it does not lend itself to forming smooth time-varying representations because coefficients cannot be reactivated once they go to zero.
There are also several sparse approximation methods built on a parallel computational framework that are related to our LCAs. These algorithms typically start with many super-threshold coefficients and iteratively try to prune the representation through a thresholding procedure, rather than charging up from zero as in our LCAs. In addition, most of these algorithms are not explicitly connected to the optimization of a specific objective function.