Highly dimensional adaptive processors (devices employing adaptive processors with large numbers of adaptation weights or parameters) are of interest for a wide variety of applications. These applications include:                Acoustic echo cancellers, where adaptive noise cancellers employing finite impulse response (FIR) filters with as many as 2,000 adaptively adjust filter taps are used to remove echoes induced in long-haul telephony networks.        Phased array and MIMO radar systems, where large arrays of antennas (10-1,000 elements/array) are used to electronically steer beams at detected targets and nulls at jammers and clutter sources, by combining signals received by the array and distributing signals transmitting to the array using large linear matrix operations.        Digital predistortion (DPD) processors, where nonlinear adaptive processors with large numbers of parameters (e.g., Volterra-series approximations of nonlinear processes) are used to adaptively learn, and digitally invert nonlinear effects added by high-power amplifiers.        Smart Grid networks employing spread spectrum modulation formats with large spreading factors and adaptive dispreading methods to separate large numbers of co-channel signals, and to detect and remove spoofers from the networks.        Massively MIMO cellular networks employing base stations with very large numbers of antenna arrays.        
To effect adaptive signal processing in these applications, practical means for adjusting large numbers of weights must be developed and implemented. Techniques that have been developed in past to accomplish this include nonblind techniques that exploit a known reference signal (e.g., a training or pilot signal inserted into a signal transmitted to the adaptive processor); “partially blind” techniques that exploit a known reference signal with unknown effects added by the communication channel, e.g., delay caused by clock timing offset and physical distance between the transmitter and receiver, and carrier offset caused by LO offset and Doppler shift between the transmitter and receiver; and fully blind methods that only exploit general structure of the transmitted signal. In many systems, a reference signal can only be made available on a sparse basis, e.g., at the beginning of signal reception, after which the processor must operate using fixed weights without additional training between reference signal reception intervals.
These techniques can also be subdivided into methods with “order-M” (O(M)) or linear complexity, where the real multiply-and-accumulate (RMAC) operations per input data sample needed to adapt the processor is on the order of the number of weights M being adjusted by the processor, and methods with higher-order (e.g., O(Mν), where ν>1) complexity, where the RMAC's per data sample needed to adapt the processor rises much faster than the number of weights being adjusted by the processor. Typically, the most powerful and effective adaptive processing methods have complexity of high order. This presents significant challenges in applications where the number of adaptation weights M is very large.
Lastly, these techniques can be subdivided into sample-processing methods, where the processor weights are adapted every time a new input data sample is provided to the processor, and block-processing methods, where a block of input data is received and used to adapt the processor. In some cases, the algorithm may circulate through the data block multiple times before moving onto the next processing block. Again, the more powerful and effective adaptive processing methods employ block processing, typically with a block size N that is (in many cases, must be) a large multiple of M. However, the cost of this processing is reduced update rate; reduced response time to changes in channel effects affecting the adaptive processor; and (e.g., for multiple passes through the data block) additional increase in complexity.
It should also be noted that the operations referred to above are the “adapt-path” operations used to train the adaptive processor, not the “data-path” operations used to implement the adaptive processor during and after training. Adapt-path operations are used to tune the adaptive processor used to process a set of signals, while data-path operations are used to process a set of signals during and after tuning. For most of the applications described above (the DPD application being a notable exception), the data-path operations have O(M) complexity, regardless of the complexity of the adapt path.
To address the adapt-path complexity issue in particular, the concept of a partial update (PU) method (PUM; in the plural, PUMs) that only updates a subset of M1 weights during each adaptation block or sample (referred to hereafter as a block with size N=1) has been proposed for a number of applications. All PUMs developed to date can be interpreted as linearly-constrained optimization techniques, in which the original method is adjusted by applying a hard linear constraint that forces M0=M−M1 weights to remain at the same value between adaptation blocks or samples. The subset of weights actually adapted during each data block, or during each of several passes through a data block, are changed during each adaptation event, so that every weight is updated over the course of multiple adaptation events.
This approach has substantive limitations in practice. First, the linear constraint, by its nature, can induce severe misadjustment from the optimal solution sought by the processor. This can manifest as either or both a convergent or steady-state bias from the optimal solution, and a “jitter” or fluctuation about that steady-state solution. In some applications, e.g., phased array radar applications where the received radar waveform must be extracted from strong clutter and jamming, this can cause the system to fail entirely (studies of PUMs showing “convergence-in-mean” to optimal solutions are almost always conducted under assumptions of little-or-no noise and removable multipath distortion). Even if the processor signal of interest is received at high signal-to-interference-and-noise ratio (SINR), this can lead to well-known “hypersensitivity” issues which degrades the system performance from the optimal solution.
Second, the linear optimization constraint can only be easily added to a small subset of O(M2) optimization functions, e.g., “least-squares (LS)” or LS-like methods that can be formulated as a quadratic optimization problem, or O(M) “least-mean-squares (LMS)” or LMS-like methods that are either intended to approximate LS optimization algorithms (e.g., by replacing gradients with “stochastic gradient” approximations), or that can themselves be formulated as linearly constrained quadratic optimization problems (e.g., “normalized LMS (NLMS)” and “Affine Projections” algorithms). In many cases, adherence to the constraint significantly increases complexity of the original method, and approximations, e.g., using Lagrange multipliers in which the multiplier itself is added to the algorithm, only increases the misadjustment of the algorithm.
In summary, the current PUMs developed to data can only be used with a small number of O(M2) methods, and cannot be used with any O(Mν) methods where ν>2. This is particularly unfortunate, because the PUM should have its strongest utility with these classes of methods. This is especially evident when the complexity of the data-path processing, which as noted above is typically O(M), is added to the adapt-path processing: at best for O(M) adapt-path methods, the PUM will only reduce overall complexity by 50%. This is the background in which the present invention takes form.