Hardware Description Languages (HDLs) are predominantly used to describe integrated circuit designs. Various HDLs exist in the market today such as Very High Speed Integrated Circuit HDL (VHDL), Verilog, and System Verilog. HDL may be used to describe a design at various levels of abstraction. For instance, VHDL supports many possible levels/styles of design description. These styles differ primarily in how closely they relate to the underlying hardware. Some levels focus more on the behavior and dataflow of a design, while other levels focus more on the structural and timing aspects of the design.
For example, integrated circuit designs may be described at the dataflow level of abstraction, often called the register transfer level (RTL). In this intermediate level of abstraction, a design is described in terms of how data moves through the design. At the heart of most digital systems today are registers, and an RTL model describes how information is passed between registers in the design. This movement is synchronized at specific points of time which are indicated by the changes of values of a special design signal commonly known as a clock. Typically, while an RTL model of the combinational logic portions of the design is described at a relatively high level, the timing and operation of registers in the design are described more specifically. RTL is therefore an intermediate level that allows the drudgery of combinational logic to be simplified (and automatically generated by logic synthesis tools) while the more important parts of the circuit, the registers, are more completely specified. Once the design is specified in an RTL model, RTL synthesis tools translate, or synthesize, this model into a still lower level model of abstraction, i.e., into a gate-level structural model. Synthesis refers to the process of transformation of a design model from a higher level of abstraction to a lower level. These transformations typically try to improve upon a set of objective metrics (e.g., area, speed, power dissipation) of a design.
Once a design has been described, to increase likelihood of first pass success, the design is typically verified for proper functionality prior to physical fabrication as an integrated circuit chip. While being tested, an HDL model of a design is called a Design Under Test (DUT). This DUT (which is an RTL design model) is simulated using a testbench. The testbench generates a set of input test vectors, or stimuli, and applies the stimuli to the DUT. The testbench also reads a set of output test vectors from the DUT in response to the stimuli. The testbench collects the responses made by the DUT against a specification of correct results.
A testbench in its traditional form is described at a behavioral level and defines the environment for the DUT in its target system. Behavioral HDL, which is the currently highest level of abstraction supported in HDL, describes a design in terms of what it does (or how it behaves) rather than in terms of its structural components and interconnection between them. A behavioral model specifies a relationship between signals within the design as well as inputs to and outputs from the design. When creating a behavioral model of a design, one describes the operation of the design over time. The usage of time is a critical distinction between behavioral descriptions of circuits and lower-level descriptions such as a dataflow level of abstraction.
In a behavioral description, time may be expressed precisely as absolute delays between related events (such as the propagation delays within gates and on wires), or time may be a factor by defining the sequential ordering of events. Synthesis tools currently attempt to transform behavioral HDL models into lower-level HDL models. However, synthesis tools presently do not attempt to maintain the identical behavior in actual circuitry as defined in the behavioral model. In other words, exact time sequencing of the design elements are not preserved in synthesis. Therefore, such synthesis tools can not be used for synthesizing behavioral testbenches.
Design verification may be performed using a variety of methods. For example, software based simulators are the most commonly used verification tools. Software simulators have an advantage in that they can accept HDL at any level of abstraction, such as a behavioral level of abstraction, thus providing a way to simulate both a DUT (in RTL) and its testbench (in behavioral description). However, simulators have a disadvantage in that, for large designs, simulators typically can achieve a speed of not more than a few tens to hundreds of clock cycles per second (cps).
To increase the overall simulation speed, co-simulation approaches have been used, in which the behavioral testbench runs on a software simulator and the RTL DUT is mapped and executed onto a reconfigurable hardware platform. The reconfigurable hardware platform may be implemented as, e.g., a plurality of reconfigurable hardware elements, such as a set of general-purpose processors and/or Field Programmable Gate Arrays (FPGAs).
To execute the DUT on the reconfigurable hardware platform (also referred to herein as an emulator), the RTL model of the DUT is first translated into a structural model using an RTL synthesis tool. This structural model, known as a netlist, describes a circuit in terms of interconnection of gate level components. The emulator may implement the RTL model of the DUT on, for example, a collection of reconfigurable hardware elements such as an array of field-programmable gate arrays (FPGAs) or the like.
The structural level models a system as a collection of logic gates and their interconnection to perform a desired function. The structural level is a representation that is closer to the physical realization of a system. Thereafter, the emulator runs the structural level description of the DUT at the actual binary gate levels and is therefore, considerably faster than a simulator being used for the same purpose. However, the testbenches in a co-simulation approach are still written in behavioral HDL and are run on a software platform, also known as a simulator. The emulator and the simulator must frequently communicate with each other in order to maintain synchronization with each other. Such frequent communication taxes the resources of the emulator and simulator, thus reducing the potential speed at which the system may operate. Because of this limitation, co-simulation speeds are typically only three to ten times pure software simulation speeds. Co-simulation has other disadvantages, such as that they require memories to be re-modeled in terms of the memories available in the emulator.
Newer techniques have been developed that allow a testbench to be described using a high-level algorithmic language (HAL) such as C, C++, and SystemC. The industry as a whole is beginning to adopt the usage of such HALs to describe the Testbench at higher level of abstraction and take advantage of algorithmic property of HALs. Using HALs, a relatively new transaction-based verification methodology has also been adopted to improve the performance and verification coverage. In this recently-developed methodology, a testbench is re-structured into a timed portion (also known as a transactor) and an un-timed portion.
The timed portion, or transactor, of the testbench is responsible for direct signal level interaction with the DUT. A transactor either decomposes an untimed transaction into a set of clocked events or composes a set of clocked events into a message. When receiving messages, transactors freeze DUT clocks for a sufficient time to allow messages to be fully decomposed before providing clocked data to a DUT. Transactors also freeze DUT clocks when sending a message, and they allow message composition operations to complete before new clocked data is received from the DUT. The un-timed portion, on the other hand, is purely algorithmic, and interacts with the timed portion using abstract transactions. The un-timed portion does not utilize the concept of a clock.
By dividing the testbench into timed and un-timed portions, improvements may be realized in the overall functional coverage of the design verification process. Moreover, using the above methodology, it is easier to write a testbench and achieve better functional verification coverage. However, the entire testbench, i.e., both the timed and un-timed portions, remains as software to be executed on the work-station. Accordingly, what performance improvements are realized by using this methodology are still somewhat limited.
Several approaches have been taken to improve the interaction between the timed and un-timed portions of a testbench. These approaches are: signal-level connections, high-level abstract message passing, and function-call-based interaction. Using signal-level connections, interactions with the untimed HAL domain are triggered based on events on the signals on the boundary of the timed and untimed portions of the testbench. This approach is the most commonly used approach and is implemented using a programming language interface (PLI), typically provided by conventional simulators. High-level abstract message passing is based on a communication protocol defined by the well-known Standard Co-Emulation Modeling Interface (SCE-MI) standard. The use of this approach is described in more detail below. Function-call-based interaction is a relatively new approach, wherein data transfers are performed using function call arguments. System Verilog has adopted this approach, which is known as Direct Programming Interface (DPI).
Other attempts have been made to improve the performance of this new timed/un-timed methodology through co-simulation by using a hardware accelerator or emulator to run the DUT model while a HAL simulator runs the testbench on a workstation. However, this approach requires substantial communication overhead. In co-simulation, communication between the hardware-implemented DUT and the software-implemented testbench is event-based and at the signal level, and therefore occurs frequently. Unfortunately, due to this high communication overhead between the DUT and the testbench, co-simulation improves verification speed for most designs by, at most, a factor of three to ten.
Still other attempts have been made to improve verification performance. The use of SCE-MI has provided more than an order of magnitude improvement in performance by modeling transactors at the RTL level and synthesizing them to execute, not as software, but as hardware on an emulator. SCE-MI is, in a nutshell, an interface that bridges two different modeling environments, each of which supports a different level of modeling abstraction. More particularly, SCE-MI is a transaction-based methodology that can be used for both Verilog and VHDL designs, and that provides a low-level interface and mechanism for passing messages between the HDL domain (which is on a reconfigurable hardware platform, such as an accelerator or an emulator) and the HAL domain (which is on a sequential computation platform, such as a software-executing workstation). On the HAL domain, SCE-MI provides a set of functions that are callable to send or receive messages to or from the HDL domain. The HAL domain may create multi-cycle transaction packets and send the packets to the HDL domain as a single message, or receive multi-cycle output as a single message from the HDL domain, which is then decomposed on the HAL domain into multiple packets so that they may be processed. Likewise, the HDL domain may receive these transactions (For example multi-cycle stimulus packets), and will decompose them into multiple cycle level signals that are then applied to the DUT. On the HDL domain, there is a pre-defined set of input and output hardware macros that the designer may use to send or receive any messages to or from the corresponding HAL domain.
Using SCE-MI, the testbench, including algorithms for stimulus generation and DUT output processing, may be written in the HAL domain. The stimuli generated by the testbench may be communicated to the HDL domain at the transaction level, whereby the HDL side receives these transactions at a high level, decomposes them into cycle level signals, and applies the cycle level signals to the DUT. Similarly, the HDL domain collects the DUT outputs, creates transaction packets containing the outputs, and sends the transaction packets to the HAL domain, which then decomposes the transaction packets into output data. Due to the packetizing and decomposition that must occur, communication between the HDL domain and the HAL domain uses a much faster clock than the design clock used by the DUT. The transactor runs on this faster clock and can control the design clock during message decomposition and packetizing. The user/designer instantiates a clock macro in the transactor through which the design clock of the DUT is generated and controlled. By using the transactor to control the design clock, the transactor is able to determine and control when the DUT clock should be stopped and when it should be allowed to run. In this methodology, the faster clock is not controllable and is commonly known as Un-Controlled clock (uclock).
However, SCE-MI as described above has some problems. First, SCE-MI is difficult to use and design with. SCE-MI is a very complex modeling paradigm with complicated communication protocols, placing significant burdens on the designer. The APIs and structural macros of SCE-MI are very low level and therefore difficult to use. Also, to use SCE-MI properly, the designer must understand the concept and appropriate usage of uncontrolled-clocks. This places yet another burden on the designer. Additionally, SCE-MI is inherently non-deterministic, and so verification results may be non-repeatable in certain situations. This is a major limitation since verification and debugging issues can be very difficult if verification runs are not repeatable. Finally, SCE-MI requires that the testbench transactors be written only at the RTL level to be synthesizable to run on the reconfigurable hardware platform. Thus, with SCE-MI, the complete HDL side must be written at the RTL level.
System Verilog provides yet another modeling interface for communication between the HDL and HAL models, called the Direct Programming Interface (DPI). DPI allows imported and exported tasks or functions to be called on both the HDL domain as well as the HAL domain. Thus, functions or tasks can be written in HDL yet called from the HAL domain (e.g., using the C language). Such functions and tasks are known as exported functions and tasks. Likewise, functions can be written on the HAL domain and called from the HDL domain as tasks or functions. Such functions and tasks are known as imported functions and tasks. In general, DPI works well, is easy to use, and does not require that the designer learn any new language or methodology. Still, DPI is extension of System Verilog, and hence the entire system (both the DUT and the testbench) is limited to running on a System Verilog software simulator. Accordingly, little if any performance improvement is realized by using DPI.