The present invention generally relates to simulating the operation of integrated circuits, and more specifically, to using Field Programmable Gate Arrays to simulate the operation of integrated circuits.
As system on chip and multiple processor cores on a single chip are becoming a common practice, simulating these complex chips is becoming an expensive challenge. One of the techniques adopted in simulating these complex systems is Field Programmable Gate Array (FPGA) based hardware accelerators. These hardware accelerators work on the principle of dividing the chip design (device under test—DUT) into small blocks. These blocks are then implemented on various FPGAs. These FPGAs are inter-connected to each other in the same fashion as the original DUT design. The chip or DUT simulations can then be run on this specialized FPGA hardware instead of running them on a conventional simulator. Conventional simulators are completely written in software and run on a general purpose computer. Hardware simulators can give typically a speed advantage of several orders of magnitude over conventional simulators.
Accelerating the simulation may be desirable for a number of reasons. The number of simulations to be performed to validate a large digital chip is very large. To complete those simulations in a reasonable time using software, a large number of computers have to be employed, with the corresponding associated cost. An accelerated simulator reduces this number. Furthermore, it is often necessary to simulate a circuit for a very long time before getting to the point of interest. This long simulation is a sequential process that may take several days for a software implementation, and cannot be sped up by just using more computers.
One of several design challenges which arise in building hardware simulation accelerators is cycle accuracy. The FPGA based hardware accelerator should exactly match the behavior of the DUT on a cycle by cycle basis, which means—if the DUT were simulated on a software simulator or when the DUT is built into a single or multiple chips, at any given DUT clock cycle, all three systems—the hardware accelerator, the software simulator and the DUT chip—should be in the same state. This becomes a significant challenge in the design of hardware accelerators, as the DUT design may contain different kinds of memory—register arrays, SRAMs, embedded or external DRAMs. All of these DUT memory types have to be mapped into the FPGA on-chip memory or external memory connected to the FPGA.
Another design challenge in building hardware simulation accelerators is cycle reproducibility, which is defined as follows: multiple executions starting from the same initial condition shall yield identical trace for all DUT state. Every time that the simulation is performed with exactly the same stimulus, exactly the same results should be obtained by the simulator. In some instances, for example, the system might be running the simulation at different levels of optimization. At the highest level of optimization, the simulation runs very fast, and is used to check that nothing is wrong. If something is wrong, though, and the optimized simulation flags it, it is desirable to reproduce this simulation at a lower level of optimization that leaves a good trace for circuit debugging. The two simulations should behave exactly the same, or it would not be feasible to debug the circuit in this manner. Even though this cycle reproducibility property is usually easy to ensure in software implementations of the simulator, it becomes a significant issue when the software technique is replaced with a hardware accelerator technique. In one or more cases, this aspect is one of the more severe limitations on how much it is possible to speed-up the simulation of a digital circuit.
Cycle reproducibility is critical for enabling efficient debug of the simulation, and this requirement constrains how clocking and reset of the entire acceleration system is implemented. The requirement for cycle reproducibility also adds significant challenge in how the DUT memory is mapped onto the accelerator platform. Since the memory of the DUT constitutes a large portion of the system state, all addressable content of such memory needs to be properly initialized and maintained to match that of the software simulation and final chip implementation.