1. Field of the Invention
The present invention relates generally to methods and systems of processing discrete representations of data sets. More specifically, the present invention relates to methods and systems for analyzing a data set into orthogonal components related by statistical correlations. Such methods and systems can be used in recovering data of interest from observed discrete data where the observed data may include both data of interest and other data; for example, in estimating a scalar signal from vector data.
2. Background Information
1. A Note on Computational Cost.
The computational cost of a calculation is commonly estimated by counting the number of “floating point operations,” or “flops.” This estimate of cost is called a “flop count.” For example, consider a matrix M and a column vector v, where M and v both have N rows and M has K columns. To calculate the product Mv, the total number of scalar-scalar products is NK, and the total number of scalar-scalar sums is N(K−1). Thus, the total flop count is N(2K−1), or approximately 2NK. The flop count is only an estimate of actual computational cost because other operations are required in practice (such as moving data in memory), but the flop count is a commonly accepted indicator of cost. One reference defining flop counts is G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd edition, Johns Hopkins University Press, 1996 (esp. pp. 18–19), with examples tabulated on pp. 254, 263, and 270.
A flop count is commonly estimated in turn by a simple formula for the “order of growth” that estimates how the count grows with the amount of data to be processed. In the example above, the flop count grows with order O(NK), where the notation “O( )” stands for the “order.” The expression O(NK) omits the factor of two because “O( )” describes a rule for growth or scaling with increasing amounts of data input. In this context, the factor of two is a negligible detail compared to the linear growth with N and K. One reference defining “order of growth” is Harold Abelson and Gerald Sussman, Structure and Interpretation of Computer Programs, MIT Press and McGraw Hill, pp. 39–40.
In this context, we hope to avoid computational methods that grow with N as O(N3) in favor of discovering methods that might grow as O(N2) or O(N).
2. An Exemplary Problem Domain.
The exemplary problem domain presented in this application is adaptive signal processing for a set of discrete data samples. This application discloses preferred embodiments of the present invention in the context of that domain. A general block diagram of interest is shown in FIG. 1. In that figure, a filter operates on a set of scalar inputs d0(k) and a set of vector inputs x0(k) to give a set of scalar outputs ε0(k). Together, the scalar inputs and vector inputs characterize the input data. The integer index k indicates that the data consists of discrete samples. The filter weights w0 are chosen to optimize some performance measure; for example, minimizing a cost function, or maximizing some other measure, like a signal-to-noise ratio. As one example, when the filter weights are chosen to minimize a quadratic cost function, the filter is known as a “Wiener” or “least squares” filter [See Simon Haykin, Adaptive Filter Theory, 3rd edition, esp. pp. 194, 483, Prentice-Hall]. Here we use the term Wiener filter.
As a more specific example, FIG. 2 shows a Wiener filter processing a set of discrete data input samples x(k) to find the part of the data that best matches a steering vector s (also called a replica vector or a focusing vector). For example, in single-frequency adaptive beamforming, s can be a model of the spatial pattern a signal can trace over the antenna array; s typically being complex. For wireless communications, s can be the spreading code of an individual user in a Code Division Multiple Access (CDMA) system. For space-time adaptive processing (STAP) applications, s can be a pattern in time and space (being transformed to frequency and angle). To describe all of these applications with a common notation, we choose to normalize the steering vector to be dimensionless with unit norm∥s∥=(sHs)1/2=1.  (1)
In any of these applications, the blocking matrix B finds the part of x(k) orthogonal to s. Thus, d0(k)=sH x(k) identifies the part of the data in the direction of s, and x0(k)=B x(k) is the rest of the data orthogonal to s.
Here we show how the filter in FIG. 2 applies to a block of data vectors shown in FIG. 3. Let x(k) be a column vector of observed data with N entries. The observed data may be complex. The index k means that x(k) is one data vector in a block of data vectors, where 1≦k≦K. Increasing k typically corresponds to increasing time. Let the block of data vectors be the N-by-K matrixX≡[x(1),x(2), . . . ,x(k), . . . ,x(K)].  (2)
In this example, the objective is to filter X to extract the part of the observed data that best matches a steering vector or replica vector s.
3. Approaches to Retrieving Data of Interest from Observed Data.
Non-adaptive Approaches
A simple approach to retrieve the data of interest is to multiply the input data vectors onto the steering vector s, so the result of the filtering would be the product sH X. A generalization of this approach is to define a weight vector w that is some function of the steering vector s, and the result of the filtering would be the product wH X. For example, w might be a copy of s with “tapering” applied to reduce sidelobes. This type of filtering is called “non-adaptive” because the weight vector w does not depend on the data X. When used in beamforming, this approach is called conventional beamforming. When used in some other applications (like CDMA), it is called matched filtering.
Data-Adaptive Approaches Generally
A more sophisticated approach to retrieving the data of interest is to calculate a weight vector w that depends on the steering vector s and the data being processed X. The result of the filtering would be the product wH X, and this type of filtering is called “data-adaptive” or “adaptive” because the weight vector w depends on the data X. Many adaptive methods exist. One survey book is the one by Simon Haykin, Adaptive Filter Theory, 3rd edition, Prentice-Hall; 1996; another survey book is Adaptive Signal Processing by Bernard Widrow and Samuel Stearns, Prentice Hall, 1985. This approach customizes the filtering for that set of data vectors.
To define an “optimal” weight vector w, an adaptive approach defines a “performance function” and then finds the weight vector that optimizes the performance function. Perhaps the most common performance functions are quadratic or energy-like cost functions to be minimized, although other performance functions are possible, and the invention described here is not restricted to any particular choice of a performance function
The Minimum Variance Distortionless Response (MVDR).
One example of a simple and popular adaptive approach to retrieve data of interest from a discrete observed data set X is to solve for the Minimum Variance Distortionless Response (MVDR). In this approach the data vectors x(k) are multiplied onto a weight vector w that minimizes a quadratic cost function, e.g., an energy-like cost function, subject to the constraint that the weight vector w should pass the data that matches a replica vector s. As used herein, preferred embodiments of the present invention use averages that are “block-averages” over finite sets of K things (not ensemble averages), i.e.,
                                          〈                                                  〉                    K                ≡                              1            K                    ⁢                                    ∑                              k                =                1                            K                        .                                              (        3        )            
The minimization is written with the constraint wHs=1 as
                              min                                                    w                H                            ⁢              s                        =            1                          ⁢                                            〈                                                                                                            w                      H                                        ⁢                                          x                      ⁡                                              (                        k                        )                                                                                                              2                            〉                        K                    .                                    (        4        )            4. A Traditional Adaptive Filter Approach to Solving for the MVDR
A traditional approach to solving for the MVDR formulates the solution in terms of a data correlation or covariance matrixR=<x(k)xH(k)>K=XXH/K.  (5)
Using R, the minimization (4) can be rewritten
                              min                                                    w                H                            ⁢              s                        =            1                          ⁢                              (                                          w                H                            ⁢              Rw                        )                    .                                    (        6        )            
The traditional approach assumes that R has full rank (so R has an inverse, i.e., R−1). The traditional solution of (4) or (6) iswTraditional=R−1s/(sHR−1s).  (7)
To calculate R, evaluating (5) for one block would require on the order of N-squared times K multiplication operations, or in shorthand, O(N2K). Calculating the matrices for the total number of data vectors T would cost O(N2T) operations. To calculate R−1, the matrix inversion for one block would cost O(N3) operations, and for the total number of data vectors T, the inversions would cost O(N3T/K) operations. To multiply s onto R−1 for each block would cost O(N2) operations, and for the total number of data vectors T, it would cost O(N2T/K) operations. Thus, the tradition approach involves costs on the order of O(N2) and O(N3), which in many practical applications is an onerous processing load for systems where N is large.
5. The Multistage Wiener Filter (MWF) Solution for the MVDR
The Multistage Wiener Filter is a data-adaptive filter that can be used for signal detection, estimation, and classification. For example, when used with a steering vectors, an MWF can process a stream of one or more blocks of data vectors x(k) to estimate how much the data resembles a replica vector s. In more sophisticated applications, an MWF can satisfy multiple constraints (where a single steering vector s may be replaced by a matrix of steering vectors).
J. S. Goldstein, I. S. Reed, and L. L. Scharf present a Wiener filter structure using multistage decomposition in “A multistage representation of the Wiener filter based on orthogonal projections,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2943–2959, November 1998 [GOLDSTEIN], incorporated herein by reference in its entirety. [GOLDSTEIN] presents a generalized sidelobe canceller (GSC) MDVR-constrained Wiener filter and a method for implementing such a filter as an MWF. An example of such a filter with two stages is shown in FIG. 2.
The MWF described in [GOLDSTEIN] provides an MVDR solution with advantages over the traditional MVDR solution shown here in Equation 7, e.g., the MWF works in situations where R does not have an inverse. For the special case where the data are stationary and the MWF is calculated to full rank, it gives the same performance as traditional solutions for the MVDR based on a data covariance matrix.
More typically, real-world applications such as:                Wireless communications—where users are turning transceivers on and off;        Passive beamforming—when wave sources are moving with respect to the receiver; and        Active radar and sonar—when clutter statistics vary with range along the ground or ocean floor;present non-stationary data. In the face of such nonstationary data, the MWF may perform better than traditional solutions that are based on a data covariance matrix inversion. When the MWF is calculated with reduced rank, it may outperform traditional the MVDR solution when the sample support is low. The MWF is based on statistics that represent interference more directly and efficiently than a covariance matrix. MWF filters have been investigated in simulations for wireless communications and have been tested with real radar data. In each case, technical performance can equal or exceed that of previous methods.        
However, as indicated in FIG. 2, the [GOLDSTEIN] MWF calls for calculation of the matrix product between a blocking matrix B and the data x(k) to find the data orthogonal to s. When calculated explicitly, the matrix multiplication Bx(k) incurs a cost of O(N2K) for a block 1≦k≦K. The subsequent adaptive stages involve more blocking matrices Bi to find the data orthogonal to the correlation direction vectors hi. Each stage incurs a cost of O(N2K).
It would be desirable to implement orthogonal decomposition without the explicit use of blocking matrices.