Field programmable gate arrays (FPGAs) offer an attractive alternative to application specific integrated circuits (ASICs) for specific applications, especially when hardware flexibility is needed, e.g., prototyping. An FPGA typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. The CLBs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data (bitstream) into internal configuration memory cells that define how the CLBs, IOBs, and interconnect structure are configured.
FIG. 1 is a block diagram depicting a simplified example of an FPGA 10. FPGA 10 illustratively includes CLBs 17, I/O routing ring 16A, delay lock loop (DLL) blocks 19, multiply/divide/de-skew clock circuits 11, and programmable IOBs 16B. DLL blocks 19 and clock circuits 11 collectively provide well-known digital clock management (DCM) circuits for managing clock signals within FPGA 10. Those skilled in the art understand that FPGA 10 may include other types of logic blocks and circuits in addition to those described herein. For example, there may be programmable Multi-Gigabit Transceivers (MGTs) that are also programmable I/Os and located next to the programmable IOBs 16B. Also, there may be an embedded Application Specific Integrated Circuit (ASIC), such as an embedded processor in the Virtex II Pro™ Platform of Xilinx Corporation of San Jose, Calif.
Recently, FPGA technology has been rapidly improving, hence increasing circuit density on the FPGA die. This has allowed for FPGAs whose circuit complexity may be greater than that of complex ASICs. Thus FPGAS are becoming a replacement for ASICs in more and more cases. However, as the FPGA die gets larger, the probability of a defect on the die increases. Hence as circuit designs get more complex, there is a need for another way to implement the complex designs rather than producing larger and more costly FPGA dice.
One solution for a circuit design larger than the FPGA has been to use multiple FPGAs connected together on a printed circuit board (PCB). However, off-chip communication over the PCB, greatly reduces operating speed of the design as compared to putting the design on one big FPGA die. To improve the chip to chip communications delay, a multi-chip module (MCM) architecture was disclosed in “Field programmable MCM Systems—Design of an Interconnection Frame,” by Ivo Dobbelaere, et. al, Proceedings of the First international ACM/SIGDA Workshop on Field Programmable Gate Arrays, Berkley, Calif., Feb. 16–18 1992; pp. 52–56.
In the above article, the MCM system includes modified FPGAs connected together via a carrier die having a fixed, non-programmable, wiring pattern. The modified FPGAs-have the CLBs 17 (which is called the core) that is surrounded by a programmable interconnection frame. The programmable interconnection frame supports chip-to-chip connections. At each of the four corners of the core a special switch matrix circuit is provided. Within a switch matrix all horizontal lines can be connected to all vertical lines. Connections between the four switch matrices and between'the core and a switch matrix are provided.
The MCM system described above supports two basic connections. First, a signal from the core of die A is connected via a corner switch matrix to a pin on die A and then to a pin on die B via a fixed wire on the carrier die. The signal is switched through B's programmable interconnection frame to a corner switch matrix in B's core, where it then is connected to B's core. And second, a signal from the core of die A is connected via a corner switch matrix to a pin on die A and then to a first pin on die B via a first fixed connection on the carrier die. The signal is switched to a second pin on die B via B's programmable interconnection frame. The signal then goes from the second pin on die B to a pin on die C via a second fixed connection on the carrier die. The signal from the pin on die C goes to a corner switch matrix in C's core, where it then is connected to C's core.
There are several disadvantages of the above MCM system. First, core to core communications must go through at least two switch matrices and at least one programmable interconnection frame, hence there is delay associated with these switching components in addition to the delay due to the wires/metal traces on the carrier die. Second, when many CLBs on die A need to communicate with many CLBs on die B, the switches at the four corners can become bottlenecks. Third, a signal from a CLB on die A needs to travel to the corner of die A then to the corner of die B then to a CLB on die B, which is a significant distance. And lastly, due to the limited connections between FPGA dice, the typical amount of parallel communications that occurs when all communications are on a single die, is greatly reduced.
Therefore there is a need for a better multi-chip module architecture which overcomes the above disadvantages of the prior art.