1. Field of the Invention
Embodiments of the present invention generally relate to a simulation acceleration technique or an emulation engine used in emulating a system composed of logic gates, and more particularly, to a method and apparatus for improving the efficiency of system emulation.
2. Description of the Related Art
Hardware emulators are programmable devices used in the verification of hardware designs. A common method of hardware design verification is to use a processor-based hardware emulator to emulate the design. These processor-based emulators sequentially evaluate combinatorial logic levels, starting at the inputs and proceeding to the outputs of a circuit. Each pass through the entire set of logic levels is known as a cycle; the evaluation of each individual logic level is known as an emulation step.
An exemplary hardware emulator is described in commonly assigned U.S. Pat. No. 6,618,698 titled “Clustered Processors In An Emulation Engine”, which is hereby incorporated by reference in its entirety. Hardware emulators allow engineers and hardware designers to test and verify the operation of an integrated circuit, an entire board of integrated circuits, or an entire system without having to first physically fabricate the hardware.
The complexity and number of logic gates present on an integrated circuit has increased significantly in the past several years. Moore's Law predicts the number of transistors or gates present in an integrated circuit will double every two years. Hardware emulators need to improve in efficiency to keep pace with the increased complexity of integrated circuits.
A hardware emulator is comprised of multiple processors. The processors are arranged into groups of processors known as clusters, and the clusters of processors collectively comprise the emulation engine. Each processor is capable of emulating one or more logic gates, mimicking the function of logic gates in an integrated circuit. The processors are arranged to compute results in parallel, in the same way logic gates present in an integrated circuit compute many results in parallel.
The output of a processor or input to the emulator is stored in a memory known as a data array such that the output can be used by the processor, another processor or some other device. The data array has a single read/write port. The output from a read port of the data array provides a single bit of input to a processor. A typical processor has, for example, four inputs. Processors are generally grouped together into clusters and share a data array. Each of the processors produces one output per instruction during each emulation step. A data array coupled to N processors (where N is an integer and each processor has four inputs) must write N bits (1-bit output from each of the N processors) and have 4×N 1-bit read ports to supply four input bits to each of the N processors. Since conventional memory devices do not have multiple ports, a plurality of memory devices (each having one read/write port) are arranged in parallel to have the same N-bit data written to each device during the write cycle and during the read cycle a single bit is provided to one of the processor's inputs. The data stored in each of the memory devices in the cluster is identical and the emulator selectively accesses the data that is to be coupled to each processor. Thus, any one processor in the cluster can access any of the data available to any other processor in the cluster. Unfortunately, as the number of instructions processed per emulation cycle increases, the amount of processor output increases and the corresponding size of the data array must also increase. Each additional instruction within an emulation cycle requires an additional N-bit word to be stored in each memory device. Thus, in an emulator using N four input processors, each additional instruction within an emulation cycle requires 4×N additional bits of storage in the data array. The cost of adding extra data array memory is a significant factor that affects the overall cost of a hardware emulator.
Therefore, there is a need in the art for a method and apparatus that improves the number of instructions a processor can execute per cycle without the significant cost increase attributed to adding data array memory.