Modern integrated circuit designs have become extremely complex. As a result, various techniques have been developed to verify that circuit designs will operate as desired before they are implemented in an expensive manufacturing process. For example, logic simulation is a tool used for verifying the logical correctness of a hardware design. Designing hardware today involves writing a program in the hardware description language. A simulation may be performed by running that program. If the program runs correctly, then one can be reasonably assured that the logic of the design is correct at least for the cases tested in the simulation.
Software-based simulation, however, may be too slow for large complex designs such as SoC (System on Chip) designs. Although design reuse, intellectual property, and high-performance tools all can help to shorten SoC design time, they do not diminish the system verification bottleneck, which consumes 60-70% of the design cycle. Hardware emulation provides an effective way to increase verification productivity, speed up time-to-market, and deliver greater confidence in final products. In hardware emulation, a portion of a circuit design or the entire circuit design is emulated with an emulation circuit or “emulator.”
Two categories of emulators have been developed. The first category is programmable logic or FPGA (field programmable gate array)-based. In an FPGA-based architecture, each chip has a network of prewired blocks of look-up tables and coupled flip-flops. A look-up table can be programmed to be a Boolean function, and each of the look-up tables can be programmed to connect or bypass the associated flip-flop(s). Look-up tables with connected flip-flops act as finite-state machines, while look-up tables with bypassed flip-flops operate as combinational logic. The look-up tables can be programmed to mimic any combinational logic of a predetermined number of inputs and outputs. To emulate a circuit design, the circuit design is first compiled and mapped to an array of interconnected FPGA chips. The compiler usually needs to partition the circuit design into pieces (sub-circuits) such that each fits into an FPGA chip. The sub-circuits are then synthesized into the look-up tables (that is, generating the contents in the look-up tables such that the look-up tables together produce the function of the sub-circuits). Subsequently, place and route is performed on the FPGA chips in a way that preserves the connectivity in the original circuit design. The programmable logic chips employed by an emulator may be commercial FPGA chips or custom-designed emulation chips containing programmable logic blocks.
The second category of emulators is processor-based: an array of Boolean processors able to share data with one another is employed to map a circuit design, and Boolean operations are scheduled and performed accordingly. Similar to the FPGA-based, the circuit design needs to be partitioned into sub-circuits first so that the code for each sub-circuit fits the instruction memory of a processor. Whether FPGA-based or processor-based, an emulator performs circuit verification generally in parallel since the entire circuit design executes simultaneously as it will in a real device. By contrast, a simulator performs circuit verification by executing the hardware description code serially. The different styles of execution can lead to orders of magnitude differences in execution time.
An emulator typically has an interface to a workstation server (workstation). The workstation provides the capability to load the DUV (design under verification, also referred to as DUT—design under test) model, controls the execution over time, and serves as a debugging interface into the DUV model on the emulator. The DUV model may also be referred to as circuit emulation model.
Memories are an important part of modern electronic designs. Traditionally, for memories contained in the DUT inside the emulator, the challenges include how to map large memories into available physical memories on the emulator, how to download the memory contents before a test and how to upload the contents after a test run. Large memories also cause other overheads like large compile times, sub-optimal clock speeds etc. Moreover, large memories tend to be implemented on an emulator physically not close to the design logic on the emulator, causing communication delays between them.
In transaction-based environments, in addition to DUT memories, there may be memory-based buffers containing streams of data that are either stimulus to the DUT from a driver transactor or are outputs captured from the DUT to be transported to the virtual testbench for checking. These environments may have additional requirements to peek/poke memory words (or a range of words) as part of the verification methodology. These operations are traditionally implemented by DPI (Direct Programming Interface)-based accesses via the transaction based interface optimized for small packets and fast speed. Here, the memory contents upload/download operations can also be performed via the transaction based interface if the size of the overall data is relatively smaller (<16 Mbytes).
There is a new trend of verification systems where the virtual testbench running on the workstation is a more elaborate model of the real system. For example, a fast CPU (central processing unit) model running on the workstation and a GPU (graphics processing unit) model on the emulator. In such environment, a pertinent question is how to model the system memory since that has some very involved and frequent accesses from both sides. The above mentioned DPI based access technique has been adopted. This kind of custom modeling, however, has been found to be cumbersome and needs expert users to set it up. Also, manually-built setups typically are not fully optimal for performance.
The overheads due to large memories on the emulator side and the needs for frequent accesses from the software model side have prompted efforts to searching for better memory implementation techniques.