Digital Signal Processing (DSP) is one of the most widely used digital technologies today. It is at the heart of audio and image compression innovations which have, and are, rapidly changing the world we live in. While the basic approach has been very successful, there are several problems the inventors have found to significantly limit use of its advantages.
FIG. 1A illustrates a simplified block diagram of a DSP system applicable to cellular base station, medical imaging and instrumentation system applications as found in the prior art.
In FIG. 1A, sensors 1 and 2 provide samples 10 and 12 to DSP Processor, which after performing one or more DSP tasks, generates DSP results. The samples and results are usually in the form of words composed of bits, and are often treated as numbers by DSP processors. The invention is focused on numerical processing, and from hereon, the discussion will assume that the samples and results are to be treated as numbers. The DSP Processor today is typically controlled by an internal processor clock. The sensors typically sample on a regular basis, which will be referred to herein as the sampling rate.
There is a large disparity today between processor clock rates and sensor sampling rates. Often sensors only generate between 20 million and 64 million samples per second, while the clock frequencies of processors are often between 300 and 1000 MHz. While DSP processors can run this fast, there are serious questions as to how to feed enough data into these engines to justify these clock speeds.
DSP Processors can typically perform one or more numeric operations such as adds/subtracts, multiplies and shifts per instruction cycle. A shift of a word of bits moves the bits up or down, effecting division or multiplication by powers of two.
The time between from the receipt of the last sample of a signal or message until the start (sometimes the end) of receiving the result is referred to as latency. Continuous processing means that samples enter the DSP processor continuous. In a clocked system, this means at least one sample enters the system during every clock cycle.
In many cellular base station, medical imaging and instrumentation system applications, there are excellent reasons to treat samples 10 and 12 as a single complex number. A complex number is composed of two numeric components, one called the real numeric component and the other the imaginary numeric component. The mathematical extension of numeric multiplication is called complex multiplication.
A complex number A1 will include a real component A1R 10 and an imaginary component A1I 12 and be denoted by A1=A1R+j A1I, where j refers to the square root of −1. A second complex number A2=A2R+j A2I. Complex multiplication of A1 by A2 gives a complex number with a real component of A1R*A2R−A1I*A2I and an imaginary component of A1R*A2I+A1I*A2R.
FIG. 1B illustrates a prior art complex multiplier multiplying A1 by A2 including four multipliers Mult RR, Mult RI, Mult IR and Mult II and two adders.
There are a number of common DSP tasks, which will be referred to throughout this patent application and to which the invention offers advantages. Many of these tasks are best seen as linear transformations from an input sample vector to a result vector. A vector is an ordered sequence of numbers, which may also be complex numbers.
A linear transformation acts upon an input sample vector by performing adds/subtracts, and multiplications on the numbers in the sample vector to generate the result vector. Examples of linear transformations include Fast Fourier Transforms (FFTs), Discrete Cosine Transforms (DCTs), Discrete Wavelet Transforms (DWT), Finite Impulse Response (FIR) filters and Infinite Impulse Response (IIR) filters.
Each of these linear transformations can be defined in terms of a matrix operating upon the sample vector to generate the result vector. FFTs and DCT's tend to be used on sample vectors containing a finite length sequence of samples. DWT's, FIR's, and IIR's operate on sample vectors of unlimited length. However, the matrices that define these linear transformations are finite in size, possessing a finite number of row and a finite number of columns, with a numeric entry at each row and column. The numeric entries are the coefficients, the A2R's by which the samples A1R are multiplied, with the products then summed to form the result vector components.
FFT's are extremely important. Typically, Fast Fourier Transform implementations focus on complex sample vectors whose sequence length is a power of two, such as 16 to 4,096, generating result vectors of the same sequence length as the sample vector. Without some of the amazing properties of the FFT matrix, computing an FFT of 64 complex samples, also known as points, would require up to 64 complex multiplications, and then summing those complex products to generate each of the complex vector results.
Matrix arithmetic, as with regular arithmetic, supports multiplicative inverses and factors, in the case of the FFT matrices. For a given 2^N point complex FFT matrix, the inventors are aware of four distinct factoring products that equal the FFT matrix. One of these is known as the Cooley-Tukey Factorization in honor of the two individuals credited with its discovery. While the other three methods of factoring the FFT are valid and important, the discussion from hereon will focus exclusively on the Cooley-Tukey. This decision is not meant to imply any limitation to the scope of the claims, but is done only for the sake of keeping the discussion as simple as possible.
The Cooley-Tukey Factorization for an FFT matrix is a collection of Radix 2 matrices, often called steps, which are performed in a specific sequence, the first acting upon the sample vector, generating a first result vector. The second Radix 2 matrix acts upon the first result vector to generate a second result vector, and so on until the last factor's result vector is essentially the same as the result vector of the FFT matrix acting upon the sample vector. These Radix 2 steps involve no more than two complex multiplications of an input to calculate the effect of that complex input on the complex components of the result vector. As used herein, a Radix operation will refer to the actions necessary to modify the current complex values of a result vector for a given complex input, which for the sake of consistency will be called the complex input A1.
Two adjacent Radix 2 steps in that sequence can be merged to form a Radix 4 step. Three adjacent Radix 2 steps in that sequence can be merged to form a Radix 8 step, and so on. The radix operation of a Radix 4 step will modify four complex components of the result vector for each complex input A1I of that steps's input vector. The radix operation of a Radix 8 step will modify eight complex components of the result vector for each complex input A1 of that step's input vector.
The last several hundred years have seen the emergence of the modern physical sciences and engineering as we know it today. That emergence has been fundamentally aided by the use and availability of a collection of non-linear functions and operations. The most common of these arithmetic tools of technology include division, square root, logarithms, exponentiation, sine and cosine.
These operations became the standard functions of the early scientific calculators, known as slide rules. Slide rules were in widespread use for the last several hundred years until the production of portable digital calculators, which replaced them as the tool of choice among scientists and engineers. These scientific calculators also incorporated at least this basic list of functions.
The following disclosure will make use of some basic facts regarding logarithms and exponentiation and their application to simplify the calculation of division and square roots in particular. Denote the logarithm of A1R by Log A1R, and the logarithm of A2R by Log A2R. The logarithm of the product of A1R and A2R, Log A1RA2R, is the sum of Log A1R and Log A2R. Exponentiation of Log X results in X.
FIG. 1C illustrates multiplier Mult RR of FIG. 1B, containing two log calculators receiving A1R and A2R, generating Log A1R and Log A2R, which Add R receives to generate Log A1RA2R, which Exp Calc receives to generate A1RA2R, as found in the prior art.
Summary of Some Basic Problems of DSP:
Today, DSP solutions have limited arithmetic operational flexibility. Typically, only the operations of addition, subtraction, multiplication and shifting can be done during every instruction cycle. Even a slide rule has some form of divide, square root, logarithm and exponentiation. But today's DSP solutions cannot deliver these operations at anywhere near the rate of adds, subtracts and multiplications, if they can deliver them at all in a real-time effective manner.
Today, DSP solutions face another set of problems, based upon the need for continuous processing of deep filters which may involve multi-dimensional FFT's, DCT's and DWT's.
There is a large disparity today between processor clock rates and sensor sampling rates. Often sensors only generate between 20 million and 64 million samples per second, while the clock frequencies of processors are often between 300 and 1000 MHz. While DSP processors can run this fast, there are serious questions as to how to feed enough data into these engines to justify these clock speeds.
Often systems require real-time processing of many sensors. Today this is done by buffering each sensor cluster and then bursting these sensor clusters through the DSP resources. There are two separate, consequent problems. First, the system now has to manage the scheduling, storage and communication resources required to buffer the data, setup its transmission to the DSP resources, and then act upon the results. Second, these activities lead, almost inevitably, to differing latency for data from differing, equally valued, sensors, creating further scheduling and resource problems in handling the results.