Modern high performance microprocessors have an ever-increasing number of circuit elements and an ever-rising clock frequency. Also, as the number of circuits that can be used in a central processing unit (CPU) has increased, the number of parallel operations has risen. Examples of efforts to create more parallel operations include increased pipeline depth and an increase in the number of functional units in super-scalar and very-long-instruction-word architectures. As CPU performance continues to increase, the result has been a larger number of circuits switching at faster rates. Thus, from a design perspective, important considerations such as the time needed to complete a simulation and the time needed to debug a CPU design is taken into account.
As a result, high performance, massively parallel processed environments are used to perform CPU design simulation. FIG. 1 shows a block diagram of a typical computer system (100) used to control and monitor execution of a CPU design simulation. A host computer (114), with associated data store (105), controls the simulation of the CPU design that executes on a simulation hardware (116).
The host computer (114) includes such hardware and software mechanisms as are needed to manage simulation, e.g., loading execution processor code onto a processor array, transferring test interface files, transferring design symbol files, etc. The data store (105) may contain several kinds of data including hardware definition source code files, clock file data, test interface files, programmable input files, circuit “object” files, design symbol information (or design database), etc. A general purpose computer (112) with a human interface (110), such as a GUI or a command line interface, together with the host computer (114) support common functions of the simulation environment. A simulation control program (118) executes on the host computer (114) and interacts with the simulation hardware (116) via a test interface (120). The test interface (120) facilitates data transfer between the host computer (114) and the simulation hardware (116).
The simulation control program (118) controls and monitors simulations executing on the simulation hardware (116). The simulation control program (118) also allows a user to interact with the simulation (and the simulation hardware (116)) between complete simulation cycles. The simulation control program (118) supports important functions of the simulation environment (100), including interactive display and modification of the simulation state, setting of execution breakpoints based on simulation times and states, use of test vector files and trace files, use of hardware definition language (HDL) modules that execute on the host computer (114) and are called from the simulation hardware (116), check pointing and restoration of running simulations, the generation of value change dump (VCD) files compatible with waveform analysis tools, and tracing the origin of bad signal states using backtracking techniques.
The test interface (120) supports the visibility into simulations running on the simulation hardware (116) by applying user-defined input stimuli to selected nodes in the running simulation. The output of the user-defined stimuli may be recorded as a trace of specific signals in the simulation and viewed using post-simulation analysis programs. Alternatively, the output of the user-defined stimuli may be compared to expected output values for signals in the design.
The computer system (100) for a simulation environment, as shown in FIG. 1, may be event-driven or cycle-based. Event-driven simulations propagate a change in state from one set of circuit elements to another. Event-driven simulators record relative timing information of the change in state so that timing and functional correctness may be verified. Cycle-based simulations also simulate a change in state from one set of circuit elements to another. Cycle-based simulators, however, evaluate the state of the system once, at the end of each clock cycle. A simulation cycle begins with one or more simultaneous clock edges and completes when every dependent events has completed evaluation. Cycle-based simulators abstract away the timing details for all transactions that do not occur on a cycle boundary. While specific intra-cycle timing information is not available, simulation speed is improved.
The simulation of a CPU design may execute on the simulation hardware (116), which is specifically designed for cycle-based computation (such as PHASER™), or on any appropriate computer, such as a SPARC™ workstation produced by Sun Microsystems, Inc. PHASER™ is specialized hardware developed by Sun Microsystems, Inc. for performing cycle-based computations in a massively parallel, cycle-based computing system. The system uses an array of execution processors arranged to perform cycle-based computations. One example of cycle-based computation is simulation of a cycle-based design written in a computer readable language, such as HDL (e.g., Verilog, etc.), or a high-level language (e.g., Occam, Modula, C, etc.).
Prior to executing on the simulation hardware (116), the cycle-based computation is verified for accuracy and then compiled. During compilation, the compiler decomposes a verified cycle-based computation into execution processor code that may be executed in parallel on a processor array of the simulation hardware (116) by one or more execution processors. The compiler also produces routing tables and other information, such as routing processor code, control code and a design symbol file. The design symbol file involves recording physical locations where the values of nets and registers have been stored, so that test interface (120) and routines called by the simulation control program (118) may access values of nets and registers. Input files, e.g., test interface files provide functionality for items such as trace vectors that typically contain test input data and expected outputs for debugging.
The computer systems described above are for purposes of example only. One or more embodiments of the invention may be implemented on any type of computer system or programming or processing environment.
Debugging design problems directly on the simulation hardware (116) is generally avoided because of the required input/output (I/O) communication between the simulation hardware (116) and the simulation control program (118) (executed on the host computer (114)). Given the rapid execution speed and the high contention for access of the simulation hardware (116) versus the comparably slow execution speed of the host computer (114), simulation performance is impacted when debugging is performed on the simulation hardware (116).
Thus, a common approach for debugging a large and complex CPU circuit design is to use the simulation control program (118) to debug the circuit design following the actual simulation of the design. During the simulation of the circuit design on the high performance simulation hardware (116), files are written that contain value-changes for every node in the design (or a portion of the design) of the circuit. The size of the files are generally large, but may be reduced if the value-changes are maintained only at the change of a cycle of the CPU, e.g., a cycle-based simulation. Additionally, during a cycle-based simulation, the host computer (114) only interacts with the simulation hardware (116) between cycles to minimize the effect of the I/O communication on performance. As a trade-off, because the timing details for all transactions that do not occur on a cycle boundary are abstracted away in the cycle-based simulation, specific intra-cycle timing information, i.e., certain clock edge values, are not available (e.g., for TVI (test vector interface), information of value change dumps are recorded without clock edge values in order to save storage and time). To fully and effectively debug a circuit design, such clock edge values need to be re-created. Moreover, most output file formats of TVI such as Verilog's VCD file format or the SST format used by Signalscan (a product of Design Acceleration, Inc.) require clock edge values.