1. Field of the Invention
The invention is broadly directed to methods and apparatus for combining discrete integrated circuits. In particular, the invention pertains to free-space interconnection of discrete integrated circuits.
2. Discussion of Related Art
Free-space optical interconnection (FSOI) of integrated circuits allows efficient implementation of high-performance parallel computing and signal processing systems. Various architectures and system designs for application-specific FSOI systems have been discussed and demonstrated. However, for applications beyond mere switching and backplane interconnection, little consideration has been given to the impact of IC layout constraints on FSOI system design and performance. Layout constraints affecting the complex processor interconnections required for performing FFT calculations are critically important to fully realizing the potential advantages of FSOI for fast-Fourier-transform (FFT) circuits.
The fast-Fourier-transform (FFT) calculation is a critically important feature of signal processing, telecommunications, speech processing, and high-speed control and instrumentation operations. Various hardware implementations of the FFT calculation have been developed. These conventional processors combine multiple processor and memory chips packaged using multi-chip modules (MCM's) or printed-circuit boards (PCB's) or backplane wiring to interconnect them.
The Discrete Fourier Transform (DFT) of an N-point sequence of values where k=0, . . . , N-1, is: ##EQU1## Since this formula leads to a large number of computations with N.sup.2 complex multiplications and N(N-1) complex additions, short cuts are used. The FFT algorithm, developed from the DFT by Cooley and Tukey, is a well-known example of one such short cut. The speed advantage enjoyed by the FFT algorithm was gained by eliminating repetitive computations of an exponential term that is necessary in the DFT equation.
The FFT algorithm comprises a series of multiplication and addition processes which are performed in distinct stages. These processes can be implemented using the butterfly processor (BP) device shown in FIG. 1. The butterfly-processor data flow diagram in FIG. 2, graphically shows the processor interactions that define the various stages of an FFT calculation. To evaluate N points, the FFT is calculated using N/radix# butterfly processor devices per stage in log.sub.radix# Nstages.
The radix-number determines how many inputs and outputs the BP device has. Radix-2 butterflies have N/2 BP devices per stage and log.sub.2 N stages, and each BP device has two inputs and two outputs. For example, FIG. 2 shows an implementation of a 16-point FFT having 4 stages with eight BP devices per stage. The inputs to the FFT calculation are sample values taken at time-domain data points. Its outputs are values for discrete frequency-domain data points. The FFT equations used for a radix-2 BP device are: EQU A=A+W.sub.N.sup.k B EQU B=A-W.sub.N.sup.k B
A and B are complex numbers. The expression W.sub.N.sup.k is referred to as the "twiddle factor" and its value is computed as: ##EQU2## For actual hardware implementation of the FFT, the FFT equations are broken down into real and imaginary components that can be quite simply implemented using 4 multipliers and 6 adders, and using 2's-complement conversion by inverters and the carry-in bit on the adder to perform subtractions: EQU A.sub.R =A.sub.R +B.sub.R W.sub.R -B.sub.I W.sub.I EQU A.sub.I =A.sub.I +B.sub.R W.sub.I -B.sub.I W.sub.R EQU B.sub.R =A.sub.R +B.sub.R W.sub.R -B.sub.I W.sub.I EQU B.sub.I =A.sub.I +B.sub.R W.sub.I -B.sub.I W.sub.R
Theoretically, BP devices having a higher radix value are more computationally efficient, in that they have fewer multipliers relative to the number of points processed. For example, a radix-4 processor uses 12 multipliers to process 4 data points, while an analogous radix-2 processor requires 16 multipliers to process 4 data points. However, the computational-efficiency of the higher-radix processors is paid for in increased design complexity.
In particular, only one twiddle factor is required by each of the radix-2 BP devices and that twiddle factor is computed using a relatively simple bit-reversal operation. Higher-radix BP devices require multiple twiddle factors for each BP device that are computed using more complex algorithms. This added degree of complexity in computation and distribution is pivotal when multiple devices must be integrated on a single VLSI chip. The problem of correctly computing and distributing twiddle factors for these higher-radix BP devices is a practical barrier to their commercial implementation.
Also, the fastest architecture for computing an FFT result would be a fully-parallel system with N/2xlog.sub.2 N BP devices, for example: 53,248 BP devices for an 8192-point FFT, theoretically. However, since each BP device is about 1.8 mm.times.2.5 mm in a 0.5 cm CMOS technology, only sixteen fit on one die. Therefore 3,328 die are required and a single Multi-Chip Module (MCM) implementation for this many die is impractical. A multiple MCM implementation would be possible, but BP-device architecture requires the connection of each BP device be connected to 4 other BP devices, 212,992 connections between die. Partitioning this massive circuit is a difficult problem, geometrically, and the timing losses resulting from the distances traversed by those connections is a serious drag on processor performance, compromising the speed advantage of a massively parallel design.
FSOI devices enjoy the efficiency benefits of low-power surface-normal optical interconnect technology and speed and cost benefits of the high-volume VLSI fabrication processes. The result is fast, low-latency inter-chip interconnects that eliminate several layers of electronic fabrication and packaging. CMOS-SEED devices use multiple quantum-well (MQW) diode devices, which operate either as optical-output reflection modulators or optical-input photodetectors. The quantum-well diodes are bonded to the CMOS device's surface and electrically interconnected in a grid located directly over the silicon circuits. The diodes in the grid run at the speed of the underlying silicon technology, in a small area, at low power, and in high-density input and output arrays with added design flexibility provided by the diodes dual-operability as either input or output elements.
Hybrid CMOS-SEED technology is particularly advantageous for FSOI applications, and is rapidly becoming widely available and well-known. Hybrid CMOS-SEED devices integrate high-speed GaAs MQW photodetectors and modulators with the high-volume commodity CMOS VLSI processes are usable for high-speed FFT processing. However, implementing a conventional free-space optical butterfly configuration between the 16 BP devices on the respective die requires 13,312 optical connections between the die. For a high-speed FFT design, every optical connection would have to be at least a 32-bit parallel connection, simultaneously providing 425,984 beams of light while preserving the topology of the butterfly structure. The logistics of this theoretically possible parallel PFFT processor design renders it impractical for implementation in either type of CMOS-SEED technology.
Alternatively, a serial-connection design that pairs one BP device with memory sufficient to keep track of its data provides a more compact implementation of the FFT calculation. This is also a much less expensive processor design. However, 53,248 passes are required to complete the calculation, each pass being a complete radix-2 calculation by that one BP device. This is too slow for real-time applications.
Finally, multi-chip feedforward optical interconnect systems that provide so-called "smart-pixel" arrays by combining polarization with space-division multiplexing and polarization with pupil-division multiplexing to optically shuffle FFT data are known. However, conventional smart-pixel applications such as photonic switching use cascaded optical connections between chips that are strictly feedforward. Such feedforward optical systems lack the flexibility needed for implementing even feedback, much less the bi-directional communication required to link the BP devices to the memory need to implement workable optical interconnection in multi-chip FFT systems.