Modern integrated circuit designs have become extremely complex. As a result, various techniques have been developed to verify that circuit designs will operate as desired before they are implemented in an expensive manufacturing process. For example, logic simulation is a tool used for verifying the logical correctness of a hardware design. Designing hardware today involves writing a program in the hardware description language. A simulation may be performed by running that program. If the program runs correctly, then one can be reasonably assured that the logic of the design is correct at least for the cases tested in the simulation.
Software-based simulation, however, may be too slow for large complex designs such as SoC (System on Chip) designs. The speed of execution of a simulator drops significantly as the design size increases due to cache misses and memory swapping. Hardware emulation provides an effective way to increase verification productivity. It is based on an actual silicon implementation and performs circuit verification generally in parallel as the circuit design will execute in a real device. By contrast, a simulator performs circuit verification by executing the hardware description code serially. The different styles of execution can lead to orders of magnitude differences in execution time.
Two categories of emulators have been developed. The first category is programmable logic or FPGA (field programmable gate array)-based. In an FPGA-based architecture, each chip (either a commercial FPGA chip or a custom FPGA chip) has a network of prewired blocks of look-up tables and coupled flip-flops. A look-up table can be programmed to be a Boolean function, and each of the look-up tables can be programmed to connect or bypass the associated flip-flop(s). Look-up tables with connected flip-flops act as finite-state machines, while look-up tables with bypassed flip-flops operate as combinational logic. The look-up tables can be programmed to mimic any combinational logic of a predetermined number of inputs and outputs. To emulate a circuit design, the circuit design is first compiled and mapped to an array of interconnected FPGA chips. The compiler usually needs to partition the circuit design into pieces (sub-circuits) such that each fits into an FPGA chip. The sub-circuits are then synthesized into the look-up tables (that is, generating the contents in the look-up tables such that the look-up tables together produce the function of the sub-circuits). Subsequently, place and route is performed on the FPGA chips in a way that preserves the connectivity in the original circuit design. The programmable logic chips employed by an emulator may be commercial FPGA chips or custom-designed emulation chips containing programmable logic blocks.
The second category of emulators is processor-based: an array of Boolean processors able to share data with one another is employed to map a circuit design, and Boolean operations are scheduled and performed accordingly. Similar to the FPGA-based, the circuit design needs to be partitioned into sub-circuits first so that the code for each sub-circuit fits the instruction memory of a processor.
An emulator may operate in various modes. In an in-circuit emulation mode, the emulator is connected with a user's target system to form a prototype of the system the user is designing. The emulator typically replaces the circuit being designed for the target system, allowing system-level and software testing prior to silicon availability. Although an emulator may run up to six orders of magnitude faster than a simulator, it is not fast enough to run at the same speed of the physical target system (a few megahertz vs hundreds of megahertz). Speed rate adapters are often introduced between the target system and the emulator. A rate adapter behaves like a buffer. It caches the signal activity from the design-under-test (DUT) at emulation speed and sends it at real-time speed to the target system. Conversely, it captures the signal activity from the target system at full speed, caches it, and then sends it back to the DUT at emulation speed. Even when a rate adapter is available, the constant evolution of speed and complexity of individual I/O protocols has made timely rate adapter development difficult.
In an acceleration mode, the physical target system is replaced by a virtual target system modelled via one of the high-level languages such as SystemVerilog, SystemC, or C++. The acceleration mode leverages the existing simulation testbench and removes the need for external rate adapters. The testbench creates test vectors and check corresponding responses of the circuit model. In addition to the elimination of speed adapters, the acceleration mode has advantages such as no hardware dependencies, the ability to use the emulator remotely, and the ability to run verification of corner cases.
The acceleration mode can be cycle-based or transaction-based. The cycle-based acceleration mode employs a signal-level or bit-level interface connecting the testbench processed by the host workstation to the design mode on the emulator. Each and every transition on each and every interface signal must be transferred between the testbench and the design model at the slow speed of the testbench simulated in the workstation. As a result, the speed of the emulator is wasted waiting to carry out these signal transfers.
The transaction-based acceleration reduces the traffic between workstation and emulator by replacing bit-by-bit exchanges with transaction exchanges. Data exchange through the transaction-level interface is infrequent and information-rich and high frequency pin activity is confined to run at full emulator clock rates. The transaction-level interface may be designed for small packets of data and fast streaming speed unlike the interface for loading the circuit model.
Transactors are used to facilitate the communication by mapping high-level commands (transactions) from the testbench into the signal-level, protocol-specific sequences (bit-by-bit operations) required by the design model on the emulator. A transactor typically consists of a front-end proxy interface, a back-end RTL (register-transfer level) bus-functional model, and a physical communications channel. The front-end interface is typically a behavioral model that runs on the workstation and interfaces to the testbench through Direct Programming Interface (DPI) calls. It is often written in C/C++, SystemC, or System Verilog. This front-end interface sends and receives high-level commands at the transactional level across the physical high-performance communication channel such as PCI Express using the Standard Co-Emulation Modeling Interface (SCEMI) standard or some variation of it. The back-end RTL bus-functional model runs on the emulator and interfaces with the communication channel to send and receive transactions, and converts transactions to bit-level signals for the design model. Because the back-end RTL bus-functional model is mapped inside the emulator, it can execute at the same speed of an in-circuit emulation system.
If the testbench is synthesizable—that is, it is an RTL testbench—it can be mapped onto an emulator, thereby removing all dependencies on the outside world. This embedded testbench acceleration mode can take full advantage of the intrinsic performance of the emulator and achieve the highest speed of execution. Unfortunately, only few verification teams write synthesizable testbenches, limiting drastically this deployment method. It is also a very restrictive environment as every time the test needs to be changed, a recompile has to be performed, which could be many hours depending upon design size. A variation is to divide a testbench into a hardware part and a software part: synthesizable testbench components like drives, monitors etc. are synthesized into real hardware and run inside the emulator together with the circuit model, while other non-synthesizable testbench components like generators, scoreboards, coverage collectors etc. remain in software and are simulated in the workstation.
Today, the majority of designs fall under the category of embedded SoC designs. These designs include at least one microprocessor (multiple cores are becoming popular) and massive amounts of embedded software loaded into on-board memories. The software ranges from operating systems to drivers, apps, diagnostics, and even special-purpose testing programs.
It should be appreciated by an ordinary person skill in the art that it is possible to mix different modes, such as processing embedded software together with a virtual testbench driving the design-under-test via verification IP or even in ICE mode.
Modern circuit designs often require a lengthy initialization period prior to performing normal functions. For example, a central processing unit (CPU) typically has an initial “boot sequence” that must be completed each time the CPU is turned on. Once the boot sequence is completed, the CPU can be used to process instructions or transactions. Accordingly, a verification test of the CPU cannot be performed until after the boot sequence is completed. At a typical clock speed of the CPU (e.g., a few GHz) the boot sequence may be completed within a minute. At a typical clock speed of an emulator (e.g., a few MHz), however, the boot sequence may last for a few hours. It will significantly speed up the verification process if both of the emulator and the testbench can be restored to a post-boot-sequence state directly without going through the lengthy initialization process.
The restoration can also help to speed up debugging processes. If a problem occurs in the middle of a long test process, the state of the system may be restored right before the problem occurs and debugging can be performed without waiting for the design model to run from the very beginning of the test.