Hardware based functional design verification systems, often referred to as emulators or simulation accelerators, are devices that allow functional verification of a user's logic design prior to fabrication of the design into an integrated circuit. (IC). The logic design at this stage is often referred to as the design under tests, (DUT), or design under verification. (DUV) Because it is very expensive and time consuming to fabricate a design into silicon, it is desirable to use an emulator to debug the logic to remove functional errors prior to fabrication. Design verification systems allow chip designers to test and debug their design prior to incurring the cost and time of fabrication. Once a user's design is functionally verified, it is then possible to use the emulator to design and test other features of the system. These emulators have thus become quite heavily relied upon in the IC design industry.
Design verification systems are available from various vendors, including Cadence Design Systems, Inc., San Jose, Calif., United States of America, among others. Design verification systems are of two basic types: hardware-driven systems that implement a logic design in programmable logic devices, and software-driven systems that simulate the design in one or more emulation processors.
One type of hardware-based design verification system uses a large number of interconnected field programmable gate arrays (FPGAs). FPGA-based design verification systems can be seen in: U.S. Pat. Nos. 5,109,353, 5,036,473, 5,475,830 and 5,960,191. (each of the foregoing four patents is incorporated herein by reference)
Another type of hardware-based functional verification system utilizes large numbers of processor modules. Each processor module has one or more processor integrated circuits disposed therein. Each of the processor integrated circuits has a large number of emulation processors fabricated thereon. In such a processor-based system, the DUV is programmed therein so its functionality appears to be executed in the emulation processors, which calculate the outputs of the design. Examples of processor-based verification systems can be found in: U.S. Pat. Nos. 5,551,013, 6,035,117 and 6,051,030. (each of which is incorporated herein by reference)
A user's logic design is typically in the form of a hardware description language (HDL) such as Verilog®. The initial design must be converted into a format that the emulation processors can read and execute. The host workstation performs a series of steps, together called compiling the design, to prepare the DUV to be loaded into and executed by the emulation processors.
An initial step in compiling the design converts the HDL design into a netlist description. (netlist) The netlist is a description of the design's components and electrical interconnections. The netlist includes all circuit elements necessary for implementing the design, including: combinational logic (i.e. gates), sequential logic (i.e. flip-flops and latches) and memory. (i.e. SRAM, DRAM, etc.) The netlist is then converted into a series of statements that will be executed by the emulation processors, typically in the form of Boolean equations. The statements, also called steps, are loaded into the emulation processors, which step through the statements sequentially. The processors calculate and save the outputs in the data storage arrays, after which they can be transferred back to the host workstation or used in future processing steps.
As discussed, processor-based emulation systems utilize emulation modules, so defined because each module contains a number of chips containing emulation processors and other components, such as memory, combinational/sequential logic, etc. The emulation processors on each chip are preferably clustered. Clustering of processors adds efficiency and reduces the required chip area because the processors within a common cluster may share resources such as data and input stacks. Further, clustering of processors takes advantage of shared data storage arrays, allowing communication between all processors in the system in a single step cycle. A typical arrangement of 2,048 processors, for example, would be comprised of 256 clusters of eight processors. The clustering of emulation processors and sharing of data and input stacks is described further in U.S. Pat. No. 6,618,698, which is incorporated herein by reference.
Each processor cluster inputs signals from a number of multiplexers that are used to interconnect the emulation processors. Typically, each multiplexer is dedicated to a single processor, with the output of the multiplexer connecting directly to the data storage structure dedicated to that processor. The goal of these multiplexers is to provide interconnection such that any given processor on any given cluster may receive the output of any other processor on any other cluster. The output of each processor is called the Node Bit Out, or NBO. Previous emulators have included one multiplexer per processor, and the input of that multiplexer receives every NBO in the entire design verification system. For example, a 2,048 processor system would have 2,048 multiplexers, each of which was a 2,048:1 multiplexer. In this manner, the multiplexer could select any of the 2,048 NBOs in the design verification system as input to the processor to which it is dedicated.
As logic designs grow increasingly more complex, the number of emulation processors required to efficiently verify those designs has increased dramatically. Consequently, the number of multiplexers required to handle those additional processors has increased. Not only has the number of multiplexers increased with the number of processors, but also the width of those multiplexers has increased in order to handle the additional NBOs. Thus, where a 256 processor design verification system only required 256, 256:1 multiplexers, a 2,048 processor design verification system requires 2,048, 2048:1 multiplexers. The chip area required is measured herein as the equivalent number of 8:1 multiplexers. Thus, a single 8:1 multiplexer requires 1 unit of chip area, whereas eight 8:1 multiplexers requires 8 units of chip area. This increase in number and width maps to an n log n increase in chip area, as depicted in the following table:
NUMBER OF 8:1TOTAL MUXWIDTH OF MUX1ST-LEVEL2ND-LEVEL3RD-LEVEL4TH-LEVELMuxsARRAY AREA 8:111.008.00 16:120.252.2536.00 32:140.54.50144.00 64:1819.00576.00128:116 20.2518.252,336.00256:132 40.536.509,344.00512:164 8173.0037,376.001048:1 128 1620.25146.25149,760.002048:1 256 3240.5292.50599,040.00
As seen, the total chip area required to implement the multiplexers alone, in a 256 processor chip, is 9,344 units. Increasing the number of processors to 2,048 requires a total area of 599,040 units. This table does not account for the additional interconnect required to connect each of the additional processors to each individual multiplexer. The increase in processors from 256 to 2048 thus requires at least sixty-four times the area. (599,040/9,344)
Processor-based design verification systems now require so many processors that routing resources such as multiplexers and interconnect can dominate the area of the chip. Because interconnect is so abundant and space is at a premium, interconnect and multiplexers may be located on different layers of the IC. Implementing multiplexers and interconnect on different layers, though, reduces operating speeds and increases power usage of the entire system, rendering it a non-desirable solution.
Thus, there exists a need for interconnecting increasing numbers of emulation processors within a processor-based design verification system without suffering a severe increase in area and power required by the system.