1. Origin of the Invention
The invention described herein was made in the performance of work under a NASA contract, and is subject to the provisions of Public Law 96-517 (35 USC 202) in which the contractor has elected to retain title.
2. Technical Field
The invention relates to charge coupled device (CCD) and charge injection device (CID) hardware applied to higher precision parallel arithmetic processing devices particularly adapted to perform large numbers of multiply-accumulate operations with massive parallelism and to methods for performing signal processing including Fourier transforms and convolution.
3. Background Art
Many algorithms required for scientific modeling make frequent use of a few well defined, often functionally simple, but computationally very intensive data processing operations. Those operations generally impose a heavy burden on the computational power of a conventional general-purpose computer, and run much more efficiently on special-purpose processors that are specifically tuned to address a single intensive computation task only. A typical example among the important classes of demanding computations are vector and matrix operations such as multiplication of vectors and matrices, solving linear equations, matrix inversion, eigenvalue and eigenvector search, etc. Most of the computationally more complex vector and matrix operations can be reformulated in terms of basic matrix-vector and matrix-matrix multiplications. From a neural network perspective, the product of the synaptic matrix by the vector of neuron potentials is another good example.
An innovative hybrid, analog-digital charge-domain technology, for the massively parallel VLSI implementation of certain large scale matrix-vector operations, has recently been developed, as disclosed in U.S. Pat. No. 5,054,040. It employs arrays of Charge Coupled/Charge Injection Device (CCD/CID) cells holding an analog matrix of charge, which process digital vectors in parallel by means of binary, non-destructive charge transfer operations. FIG. 1 shows a simplified schematic of the CCD/CID array 100. Each cell 110 in the array 100 connects to an input column line 120 and an output row line 130 by means of a column gate 140 and a row gate 150. The gates 140, 150 hold a charge packet 160 in the silicon substrate underneath them that represents an analog matrix element. The matrix charge packets 160 are initially stored under the column gates 140. In the basic matrix-vector multiplication (hereinafter referred to as "MVM") mode of operation, for binary input vectors, the matrix charge packets 160 are transferred from under the column gate 140 toward the row gates 150 only if the input bit of the column indicates a binary `one`. The charge transferred under the row gates 150 is summed capacitively on each output row line 130, yielding an analog output vector which is the product of the binary input vector with the analog charge matrix. By virtue of the CCD/CID device physics, the charge sensing at the row output lines 130 is of a non-destructive nature, and each matrix charge packet 160 is restored to its original state simply by pushing the charge back under its column gate 140.
FIG. 2 is an illustration of the binary-analog MVM computation cycle for a single row of the CCD/CID array. In FIG. 2A, the matrix charge packet 160 sits under the column gate 140. In FIG. 2B, the row line 130 is reset to a reference voltage. In FIG. 2C, if the column line 120 receives a logic one input bit, the charge packet 160 is transferred underneath the row gate 150. In FIG. 2D, the transferred charge packet 160 is sensed capacitively by a change in voltage on the output row line 130. In FIG. 2E the charge packet 160 is returned under the column gate 140 in preparation for the next cycle. A bit-serial digital-analog MVM can be obtained from a sequence of binary-analog MVM operations, by feeding in successive vector input bits sequentially, and adding the corresponding output contributions after scaling them with the appropriate powers of two. A simple parallel array of divide-by-two circuits at the output accomplishes this task. Further extensions of the basic MVM scheme of FIG. 1 support full digital outputs by parallel A/D conversion at the outputs, and four-quadrant operation by differential circuit techniques.
FIG. 3 illustrates how the matrix charge packets are loaded into the array. In FIG. 3A, appropriate voltages are applied to each gate 170 in each cell 110 of the CCD/CID array 100 so as to configure each cell 110 as a standard 4-phase CCD analog shift register to load all of the cells 110 sequentially. In FIG. 3B, the same gates 170 are used for row and column charge transfer operations as described above with reference to FIG. 2.
Signal Processing
The foundation of conventional signal processing algorithms is based on the use of fast techniques for performing various discrete transformations such as the discrete Fourier transform (DFT), discrete sin transform (DST), discrete cosine transform (DCT), discrete Hartley transform (DHT), and others. Consider the discrete Fourier Transform (DFT). The DFT can be represented by a Matrix-Vector Multiplication (MVM) with a computational complexity of O(N.sup.2). However, for both serial and parallel computation on conventional hardware, the Fast Fourier Transform (FFT) is always preferred.
For serial computation, the FFT achieves a computational complexity of O(N LogN). Also, for implementation on parallel and vector computer architectures, the FFT has been considered as the base line algorithm. In particular, with O(N) processors, a time lower bound of O(LogN) can be achieved in computing the FFT. Note, however, that this result is more of a theoretical importance than a practical one since, particularly for large N, implementation of the algorithm to achieve the above time lower bound would require an architecture with an excessive number of processors, and, more importantly, a very complex processors interconnection structure.
With conventional hardware technology, the time lower bound in computing a MVM is O(LogN) by using O(N.sup.2) processors. This result is more relevant to theory than to practice, since such an implementation of MVM requires a very complex parallel architecture.
In contrast, a practical implementation of MVM on the CCD/CID chip can be performed in O(1) steps. This indicates that, for efficient implementation of signal processing applications on CCD/CID chips, a new algorithmic framework is required, which significantly differs from the conventional fast techniques framework. In particular, the DHT can be more efficiently implemented than the FHT. In fact, while the DHT can be performed in O(1) with one CCD/CID chip, the implementation of FHT requires O(LogN) chips and takes O(LogN) steps.
Accordingly, there is a need for a massively parallel charge domain computing device and process which employs the DHT to achieve massive parallelism in signal processing in order to fully exploit the advantages of the CCD/CID architecture. In particular, there is a need for a process to perform a Fourier transform in a single MVM operation or plural simultaneous MVM operations in parallel in a CCD/CID architecture, each operation being performed in O(1) steps. There is also a need for a process to perform a convolution in a single MVM operation in a CCD/CID architecture in O(1) steps convolution.