1. Origin of the Invention
The invention described herein was made in the performance of work under a NASA contract, and is subject to the provisions of Public Law 96-517 (35 USC 202) in which the contractor has elected to retain title.
2. Technical Field
The invention relates to charge coupled device (CCD) and charge injection device (CID) hardware applied to higher precision parallel arithmetic processing devices particularly adapted to perform large numbers of multiply-accumulate operations with massive parallelism and to methods for solving partial differential equations therewith.
3. Background Art
"Grand Challenges" have been defined as fundamental problems in science or engineering, with broad economic and scientific impact, that could be advanced by applying high performance computing resources, while high speed digital computers with some level of parallelism are enabling the consideration of an ever growing number of practical applications, many very large-scale problems await the development of massively parallel hardware. To carry out the demanding computations involved in grand challenges, it is generally accepted that one needs to pursue a hybrid approach involving both novel algorithms development and revolutionary chip technology.
Many algorithms required for scientific modeling make frequent use of a few well defined, often functionally simple, but computationally very intensive data processing operations. Those operations generally impose a heavy burden on the computational power of a conventional general-purpose computer, and run much more efficiently on special-purpose processors that are specifically tuned to address a single intensive computation task only. A typical example among the important classes of demanding computations are vector and matrix operations such as multiplication of vectors and matrices, solving linear equations, matrix inversion, eigenvalue and eigenvector search, etc. Most of the computationally more complex vector and matrix operations can be reformulated in terms of basic matrix-vector and matrix-matrix multiplications. From a neutral network perspective, the product of the synaptic matrix by the vector of neuron potentials is another good example.
An innovative hybrid, analog-digital charge-domain technology, for the massively parallel VLSI implementation of certain large scale matrix-vector operations, has recently been developed, as disclosed in U.S. Pat. No. 5,054,040. It employs arrays of Charge Coupled/Charge Injection Device (CCD/CID) cells holding an analog matrix of charge, which process digital vectors in parallel by means of binary, non-destructive charge transfer operations. FIG. 1 shows a simplified schematic of the CCD/CID array 100. Each cell 110 in the array 100 connects to an input column line 120 and an output row line 130 by means of a column gate 140 and a row gate 150. The gates 140, 150 hold a charge packet 160 in the silicon substrate underneath them that represents an analog matrix element. The matrix charge packets 160 are initially stored under the column gates 140. In the basic matrix-vector multiplication (MVM) mode of operation, for binary input vectors, the matrix charge packets 160 are transferred from under the column gate 140 toward the row gates 150 only if the input bit of the column indicates a binary `one`. The charge transferred under the row gates 150 is summed capacitively on each output row line 130, yielding an analog output vector which is the product of the binary input vector with the analog charge matrix. By virtue of the CCD/CID device physics, the charge sensing at the row output lines 130 is of a non-destructive nature, and each matrix charge packet 160 is restored to its original state simply by pushing the charge back under its column gate 140.
FIGS. 2A, 2B, 2C, 2D and 2E are an illustration of the binary-analog MVM computation cycle for a single row of the CCD/CID array. In FIG. 2A, the matrix charge packet 160 sits under the column gate 140. In FIG. 2B, the row line 130 is reset to a reference voltage. In FIG. 2C, if the column line 120 receives a logic one input bit, the charge packet 160 is transferred underneath the row gate 150. In FIG. 2D, the transferred charge packet 160 is sensed capacitively by a change in voltage on the output row line 130. In FIG. 2E the charge packet 160 is returned under the column gate 140 in preparation for the next cycle. A bit-serial digital-analog MVM can be obtained from a sequence of binary-analog MVM operations, by feeding in successive vector input bits sequentially, and adding the corresponding output contributions after scaling them with the appropriate powers of two. A simple parallel array of divide-by-two circuits at the output accomplishes this task. Further extensions of the basic MVM scheme of FIG. 1 support full digital outputs by parallel A/D conversion at the outputs, and four-quadrant operation by differential circuit techniques.
FIGS. 3A and 3B illustrate how the matrix charge packets are loaded into the array. In FIG. 3A, appropriate voltages are applied to each gate 170 in each cell 110 of the CCD/CID array 100 so as to configure each cell 110 as a standard 4-phase CCD analog shift register to load all of the cells 110 sequentially. In FIG. 3B, the same gates 170 are used for row and column charge transfer operations as described above with reference to FIG. 2.
The particular choice for this unusual charge-domain technology resulted from several considerations, not limited to issues of speed and parallelism. In comparison to other, more common parallel high-speed technology environments (digital CMOS, etc.), the distinct virtues of the foregoing charge-domain technology for large-scale special-purpose MVM operations are the following: .smallcircle. Very High Density: The compactness of the CCD/CID cell allows the integration of up to 10.sup.5 on a 1 cm.sup.2 die (in a standard 2 .mu.m CMOS technology), providing single-chip 100 GigaOPS computation power. .smallcircle. Very Low Power Consumption: The charge stored in the matrix is conserved along the computation process because of the non-destructive nature of the CCD/CID operation. Hence, the entire power consumption is localized at the interface of the array, for clocking, I/O and matrix refresh purposes. This enables the processor to operate at power levels in the mW/TeraOPS range. .smallcircle. Scalability: The scalable architecture of the CCD/CID array allows the interfacing of many individual processors in parallel, combining together to form effective processing units of higher dimensionality; still operating at nominal speed. .smallcircle. I/O Flexibility: Although an analog representation is used inside the array to obtain fast parallel computation, the architecture of the processor provides the flexibility of a full digital interface, eliminating the bandwidth and noise problems typical for analog interfacing. .smallcircle. Programming Flexibility: The architecture allows for either optical (parallel, sustained) or electronic (semi-parallel, periodical) loading of the charge matrix. The latter method, described above with reference to FIG. 3, requires interrupts, of duration usually much shorter than the time interval in computation mode, for which the stored charge matrix remains valid before a matrix refresh is needed.
Preliminary results on a 128.times.128 working prototype, implemented on a single 4 mm.times.6 mm die in 2 .mu.CCD-CMOS technology, indicate a performance of approximately 10.sup.10 8-bit multiply-accumulate operations per second. Processors with 1024.times.1024 cells will be realizable in the near future.