The usefulness of software driven emulators has increased enormously with growth in the complexity of integrated circuits. Basically, an emulation engine operates to mimic the logical design of a set of one or more integrated circuit chips. The emulation of these chips in terms of their logical design is highly desirable for several reasons. The utilization of emulation engines has also grown up with and around the corresponding utilization of design automation tools for the construction and design of integrated circuit chip devices. In particular, as part of the input for the design automation process, logic descriptions of the desired circuit chip functions are provided. The existence of such software tools for processing these descriptions in the design process is well suited to the utilization of emulation engines which are electrically configured to duplicate the same logic function that is provided by a design automation tool.
Utilization of emulation devices permits testing and verification via electrical circuits of logic designs before these designs are committed to a socalled “silicon foundry” for manufacture. The input to such foundries is the functional logic description required for the chip and its output is initially a set of photolithographic masks which are then used in the manufacture of the desired electrical circuit chip device. Verifying that logic designs are correct in the early stage of chip manufacturing eliminates the need for costly and timeconsuming second passes through a silicon foundry.
Another advantage of emulation systems is that they provide a device that makes possible the early validation of software meant to operate the emulated chips. Thus, software can be designed, evaluated and tested well before the time when actual circuit chips become available. Additionally, emulation systems can also operate as simulator-accelerator devices thus providing a high-speed simulation platform.
Emulation engines of the type contemplated by this invention contain an interconnected array of emulation processors (EP). Each emulation processor (hereinafter, also sometimes simply referred to as “processor”) can be programmed to evaluate logic function (for example, AND, OR, XOR, NOT, NOR, NAND, etc.). The program driven processors operate together as an interconnected unit, emulate the entire desired logic design. However, as integrated circuit designs grow in size, more emulation processors are required to accomplish the emulation task. An aim, therefore, is to increase the capacity of emulation engines in order to meet the increasingly difficult task of emulating more and more complex circuits and logic functions by increasing the number of emulation processors in each of its modules.
For purposes of better understanding the structure and operation of emulation devices generally, and this invention particularly, U.S. Pat. No. 5,551,013 and patent application Ser. No. 09/373,125 filed Aug. 12, 1999, both of which are assigned to the assignee of this application, are hereby incorporated herein by reference.
U.S. Pat. No. 5,551,013 shows an emulation chip, called a module here, having multiple (e.g. 64) processors. All processors within the module are identical. The sequencer and the interconnection network occurs only once in a module. The control stores hold a program created by an emulation compiler for a specified processor. The stacks hold data and inputs previously generated and are addressed by fields in a corresponding control word to locate the bits for input to the logic element. During each step of the sequencer an emulation processor emulates a logic function according to the emulation program. A data flow control interprets the current control word to route and latch data within the processor. The node-bit-out signal from a specified processor is presented to the interconnection network where it is distributed to each of the multiplexors (one for each processor) of the module. The node address field in the control word allows a specified processor to select for its node-bit-in signal the node-bit-out signal from any of the processors within its module. The node bit is stored in the input stack on every step. During any operation the node-bit-out signal of a specified processor may be accessed by none, one, or all of the processors within the module.
Data routing within each processor's data flow and through the interconnection network occurs independently of and overlaps the execution of the logic emulation function in each processor. Each control store stores control words executed sequentially under control of the sequencer and program steps in the associated module. Each revolution of the sequencer causes the step value to advance from zero to a predetermined maximum value and corresponds to one target clock cycle for the emulated design. A control word in the control store is simultaneously selected during each step of the sequencer. A logic function operation is defined by each control word.
Each of these emulation processors has an execution unit for processing multiple types of logic gate functions. Each emulation processor switches from a specified one logic gate function to a next logic gate function in a switched-emulation sequence of different gate functions. The switched-emulation sequence of each of the processors thus can emulate a subset of gates in a hardware arrangement in which gates are of any type that the emulation processors functionally represent for a sequence of clock cycles. The processors are coupled by a like number of multiplexors having outputs respectively connected to the emulation processors of a module and having inputs respectively connected to each of the other emulation processors. The bus connected to the multiplexors enables an output from any emulation processor to be transferred to an input of any other of the emulation processors. In accordance with the teachings of the pending application, the basic design of the U.S. Pat. No. 5,551,013 patent is improved by interconnecting processors into clusters. With interconnected clusters, the evaluation phases can be cascaded and all processors in a cluster perform the setup and storing of results in parallel. This setup includes routing of the data through multiple evaluation units for the evaluation phase. For most efficient operation, the input stack and data stack of each processor must be stored in shared memory within each cluster. Then, all processors perform the storage phase, again in parallel. The net result is multiple cascaded evaluations performed in a single emulation step. Every processor in a cluster can access the input and data stacks of every other processor in the cluster and less space on each module chip for the functions that support the processor operation, particularly the memory functions.
As will be appreciated by those skilled in the art, emulators of the type described above have evolved to perform not only in a traditional emulation mode but also in a simulation-accelerate mode. In this simulation-accelerate mode, there is a requirement to upload and download large quantities of data from the system SDRAMs for the data capture function of the simulation-accelerate operating mode.
In the prior art emulators, such as the ET 3.5 and ET 3.7 emulators, the protocol for writing data to and reading data from SDRAMS requires a hand shake protocol. A word is transferred from or to the SDRAM only in response to a “done” signal from the memory signaling that the previous transfer operation has been completed. The “done” signal is required in these prior art systems to account for the case where the previous transfer operation was delayed by a memory refresh operation. Such prior art protocols slow the bulk transfer of data to and from an SDRAM memory.