Hyperspectral sensors are a new class of optical sensor that collect a spectrum from each point in a scene. They differ from multi-spectral sensors in that the number of bands is much higher (twenty or more), and the spectral bands are contiguous. For remote sensing applications, they are typically deployed on either aircraft or satellites. The data product from a hyperspectral sensor is a three-dimensional array or “cube” of data with the width and length of the array corresponding to spatial dimensions and the spectrum of each point as the third dimension. Hyperspectral sensors have a wide range of remote sensing applications including: terrain classification, environmental monitoring, agricultural monitoring, geological exploration, and surveillance. They have also been used to create spectral images of biological material for the detection of disease and other applications. Known target detection algorithms have been derived from several models of hyperspectral imagery.
The Gaussian mixture model has served as a basis for detecting known targets from hyperspectral and multispectral imagery. This approach models each datum as a realization of a random vector having one of several possible multivariate Gaussian distributions. If each observation, y∈Rn, arises from one of d normal classes then the data have a normal or Gaussian mixture probability density function:                                           p            ⁡                          (              y              )                                =                                    ∑                              k                =                1                            d                        ⁢                                          ω                k                            ⁢                              N                ⁡                                  (                                                            μ                      k                                        ,                                          Γ                      k                                                        )                                            ⁢                              (                y                )                                                    ,                              ω            k                    ≥          0                ,                                            ∑                              k                =                1                            d                        ⁢                          ω              k                                =          1                ,                            [Eqn.  1]            where ωk is the probability of class k and             N      ⁡              (                              μ            k                    ,                      Γ            k                          )              ⁢          (      y      )        =            1                                    (                          2              ⁢                                                          ⁢              π                        )                                n            2                          ⁢                                                        Γ              k                                                        1            2                                ⁢                  ⁢    exp    ⁢                  ⁢          (                                    -            1                    2                ⁢                                  ⁢                              (                          y              -                              μ                k                                      )                    T                ⁢                              Γ            k                          -              1                                ⁡                      (                          y              -                              μ                k                                      )                              )      is the normal probability density function having mean μk and covariance Γk*. The parameters {(ωk,μk,Γk)|1≦k≦d} are typically estimated from the imagery using defined clusters, the expectation maximization algorithm or related algorithms such as the stochastic expectation maximization algorithm. Known target detection algorithms are generally implemented using a bank or a linear combination of the likelihood ratio detection statistics for each class. The covariance of the observations under the target present hypothesis is usually assumed to equal the covariance of the observations under the background only hypothesis. Thus the test for the presence of a target against background class k is often formulated as the likelihood ratio for the hypotheses:H0,k:y˜N(μk,Γk)H1,k:y˜N(s,Γk),where s∈Rn is the spectrum of the target. In this case, the log of the likelihood ratio is equivalent to the spectral matched filter for a target against a background modeled by class k, i.e.                                           T            MF                    ⁡                      (                          y              ;              k                        )                          =                                                                              (                                      s                    -                                          μ                      k                                                        )                                T                            ⁢                                                Γ                  k                                      -                    1                                                  ⁡                                  (                                      y                    -                                          μ                      k                                                        )                                                                                                                          (                                          s                      -                                              μ                        k                                                              )                                    T                                ⁢                                                      Γ                    k                                          -                      1                                                        ⁡                                      (                                          s                      -                                              μ                        k                                                              )                                                                                .                                    [Eqn.  2]            Linear and convex models have also served as the basis for formulating known target detection algorithms. In this approach the data are modeled asH0:y=Wαb+ηH1:y=Sαt+Wαb+η,  [Eqns. 3]where: W is an n×P matrix such that the columns of W span an interference subspace of dimension P; S is an n×Q matrix such that the columns of S span a signal subspace of dimension Q; η is additive noise such that η˜N(0,σ2Γ). W, S, and Γ are assumed known, and αt∈RQ and αb∈RP are assumed unknown. σ2 may be known or unknown. Additionally, constraints may be placed on the coefficient vectors αt and αb, e.g                                                                         ∑                                  i                  =                  1                                Q                            ⁢                              a                ti                                      +                                          ∑                                  i                  =                  1                                P                            ⁢                              a                bi                                              =          1                ,                            (c.1)                                                      a            ti                    ≥          0                ,                              a            bi                    ≥          0.                                    (c.2)            
General procedures have not been developed for simultaneously estimating W and Γ. However, if either 1) αb is locally constant or 2) the data may be segmented into regions such that αb is essentially constant on each region, the term Wαb may be absorbed into the noise which is then modeled by η˜N(μ,Γ), where the parameters μ and Γ are estimated locally or for each segment. With W=0, Γ may be estimated from background reference data, and if Γ=In×n, a basis for W may be estimated as the eigenvectors of a background data correlation matrix having eigenvalues greater than σ2, a threshold determined from the eigenspectrum of the data correlation matrix. Eqns. 3 apply a convex or linear model to the data if the constraints (c.1, c.2) are or are not imposed, respectively.
The linear models have been used by several practitioners in the art to derive likelihood ratio and generalized likelihood ratio detection statistics. See, for example, Sharf et al. [L. L. Scharf and B. Friedlander, “Matched Subspace Detectors,” IEEE Transactions on Signal Processing, Vol 42. No. 8, August 1994, pp. 2146–2157], Kraut et al. [S. Kraut, L. L. Scharf, L. T. McWhorter, “Adaptive Subspace Detectors,” IEEE Transactions on Signal Processing,” Vol. 49, No. 1, January 2001, pp. 1–16.], and Manolakis et al. [D. Manolakis, C. Siracusa, and G. Shaw, “Hyperspectral Subpixel Target Detection Using the Linear Mixing Model,” IEEE Transactions on Geoscience and Remote Sensing, Vol 39, No. 7, July 2001, pp. 1392–1409]. Likelihood ratio and generalized likelihood ratio (GLR) techniques have also been applied to the convex model. For example, Manolakis et al. showed that the GLR test when Γ=In×n, σ2 is unknown, and W and S are known is                                                         T                              I                ,                M                                      ⁡                          (              y              )                                =                                    (                                                                                                            P                                              W                        ⊥                                                              ⁡                                          (                      y                      )                                                                                                                                                                      P                                                                        (                                                      W                            +                            S                                                    )                                                ⊥                                                              ⁡                                          (                      y                      )                                                                                                    )                        n                          ,                            [Eqn.  4]            where PA is orthogonal projection with reference to the Euclidean inner product onto the subspace A, and A⊥⊂Rn is the subspace orthogonal to A.
Spectra from a class of material are often better modeled as random rather than as fixed vectors. This may be due to biochemical and biophysical variability of materials in a scene. For such data, neither the linear mixture model nor the normal mixture model is adequate, and better classification and detection results may accrue from using more accurate methods. Stocker et al. [A. D. Stocker and A. P. Schaum, “Application of stochastic mixing models to hyperspectral detection problems,” SPIE Proceedings 3071, Algorithms for Multispectral and Hyperspectral Imagery III, S. S. Shen and A. E. Iverson eds. August 1997] propose a stochastic mixture model in which each fundamental class is identified with a normally distributed random variable, i.e.                                           y            i                    =                                    ∑                              k                =                1                            d                        ⁢                                          a                ik                            ⁢                              ɛ                k                            ⁢                                                          ⁢              such              ⁢                                                          ⁢              that              ⁢                                                          ⁢                                                ɛ                  k                                ~                                  N                  ⁡                                      (                                                                  μ                        k                                            ,                                              Γ                        k                                                              )                                                                                      ;                              a            ik                    ≥          0                ;                                            ∑                              k                =                1                            d                        ⁢                          a              ik                                =          1.                                    [Eqn.  5]            
They estimate the parameters of the model by quantizing the set of allowed abundance values, and fitting a discrete normal mixture density to the data. More precisely, let Δ=1/M denote the resolution of the quantization. Then the set of allowed coefficient sequences is   A  =      {                                        (                                          a                1                            ,              …              ⁢                                                          ,                              a                d                                      )                    ❘                                    ∑                              k                =                1                            d                        ⁢                          a              k                                      =        1            ;                        a          k                ∈                  {                      0            ,            Δ            ,            …            ⁢                                                  ,                                          (                                  M                  -                  1                                )                            ⁢              Δ                        ,            1                    }                      }  
For each {right arrow over (α)}=(α1, . . . ,αd)∈A, define                               μ          ⁡                      (                          a              →                        )                          =                                            ∑                              j                =                1                            d                        ⁢                                          a                j                            ⁢                              μ                j                            ⁢                                                          ⁢              and              ⁢                                                          ⁢                              Γ                ⁡                                  (                                      a                    →                                    )                                                              =                                    ∑                              j                =                1                            d                        ⁢                                          a                j                2                            ⁢                                                Γ                  j                                .                                                                        [Eqn.  6]            Then the observations are fit to the mixture model                               p          ⁡                      (            y            )                          =                              ∑                                          a                →                            ∈              A                                ⁢                                    ρ                              a                →                                      ⁢                          N              ⁡                              (                                                      μ                    ⁡                                          (                                              a                        →                                            )                                                        ,                                      Γ                    ⁡                                          (                                              a                        →                                            )                                                                      )                                      ⁢                                          (                y                )                            .                                                          [Eqn.  7]            
The fitting is accomplished using a variation of the stochastic expectation maximization algorithm such that Eqn. 6 is satisfied in a least squares sense. Stocker et al. demonstrate improved classification in comparison with clustering methods using three classes, and they demonstrate detection algorithms using this model. They note, however, that the method is impractical if the data are comprised of a large number of classes or if Δ is small, as the number of elements of A, which is given by:                   A              =                            (                      M            +            1                    )                ⁢                  ⋯          ⁡                      (                          M              +              d              -              1                        )                                                (                      d            -            1                    )                !              ,becomes very large. Furthermore, quantizing the allowed abundance values leads to modeling and estimation error.
Stocker et al. used this model to develop a known target detection statistic: the finite target matched filter (FTMF). Observations of the target, t, and background, b, are represented as samples from the normal random variable t˜N(μ1,Γ1) and b˜N(μ0,Γ1), respectively. An observation that consists of a fraction (1−ƒ) of background material and ƒ of target material is then modeled as y˜N((1−ƒ)μ0+ƒμ1,(1−ƒ)2Γ0+ƒ2Γ1)=p(y|ƒ). Stocker et al. define the FTMF as the generalized likelihood ratio test:                                                                         T                FTMF                            ⁡                              (                y                )                                      =                                                            max                  f                                ⁢                                                                  ⁢                                  p                  ⁡                                      (                                          y                      ❘                      f                                        )                                                                              p                ⁡                                  (                                                            y                      ❘                      f                                        =                    0                                    )                                                              ;                      0            ≤            f            ≤            1                          ,                            [Eqn.  8]            and a detection algorithm is achieved by applying a threshold to the values of TFTMF. A bank of FTMFs may be applied to Gaussian mixture data given by Eqns. 1 or 7.
These unresolved problems and deficiencies are clearly felt in the art and are solved by this invention in the manner described below.