(1) Field of the Invention
The present invention is to increase the verification performance and efficiency for systematically verifying digital systems with more than multi-million gates by using simulation and prototyping from Electronic System Level (ESL) down to Gate Level (GL) through Register Transfer Level (RTL).
(2) Description of the Related Art
In design verification, simulation is to build a pair of computer-executable models which consists of DUV (Design Under Verification) or one or more than one design object (to be defined later) inside of DUV, and TB (testbench) which drives it, to translate it into a sequence of machine instructions of a computer through a simulation compilation process, and to execute it on the computer. Therefore, simulation execution is basically accomplished by the sequential execution of machine instructions of a computer, and there are many simulation methods (event-driven simulation, cycle-based simulation, compiled simulation, interpreted simulation, co-simulation, algorithmic-level simulation, instruction-level simulation, transaction-level simulation, RTL simulation, gate-level simulation, transistor-level simulation, circuit-level simulation, etc). In other words, simulation represents a variety of processes in which DUV and TB, that are executable SW models built in a computer at a proper abstraction level (there are many abstraction level existed in IC design such as circuit-level, transistor-level, gate-level, RTL, transaction-level, instruction-level (if the design object is a processor), algorithmic-level, etc) by a modeling process, are executed in a computer to realize its functional specification or functional characteristic in SW. The advantage of simulation is to virtually evaluate the functional specification or functional characteristic of design object before the design objects is actually implemented and fabricated, to provide a high flexibility due to the SW nature, and to obtain high visibility and controllability on DUV or TB which is critical for debugging. But, its shortcoming is a low performance comes from the fact that the simulation execution is a sequential execution of machine instructions sequence. If the design complexity is large alike to the modern designs having 100 million or more gates, the simulation speed becomes extremely slow (for example, it will take 3.2 years to simulation an 100 million gates design for 100,000,000 cycles by an event-driven simulation whose speed is 1 cycle/sec). In this present invention, the simulation is defined as any SW modeling and SW execution method of DUV and TB at the proper abstraction level. More specifically, in this present invention, the simulation is defined as the process including implementing the behavior of DUV and TB at a specific abstraction level as a specific computer data structure and its well defined operations on it so that it is computer-executable, and performing a series of computations and processing of the operations on the data structure with input values in computer (Therefore, in this present invention, the simulation can be carried out by not only any commercial simulator, but also internally built simulators. Also, any process including a series of computation or processing the operations on the data structure with input values in computer is considered as the simulation if the process meets the above definition of simulation).
In contrast, the traditional prototyping is to build a system on PCB (Printed Circuit Board) by using manufactured semiconductor chips (for example, sample chips) or FPGA (Field Programmable Gate Array) chips, which implement DUV, and other components necessary to the construction of the entire system (in simulation, other components are modeled as TB), and to verify the DUV in either in-circuit or in-system environment while the entire system is running a real or almost real operation speed. If DUV and TB are not modeled virtually in SW, but physically implemented for verification, it is advantageous to verify at the extremely high speed. However, as in the prototyping environment the visibility and controllability are very low, the debugging is very difficult when it operates incorrectly.
The design size of digital circuits or digital systems are growing to tens of million or hundreds of million gates and their functionality is becoming very complex as the IC (Integrated Circuit) design and fabrication technology has been being developed rapidly. Especially, system-level ICs so called SOC (System On Chip) has usually one or more embedded processor cores (RISC core or DSP core, and specific examples are ARM11 core from ARM or Teak DSP core from CEVA), and the large part of its functionality is realized in SW. The reduction of design time is very critical to the related products success because of short time to market due to the growing competition in the market. Therefore, there is a growing interest from the industry about ESL design methodology for designing chips. Chips that are designed by using ESL design methodology, which exists at the higher level abstraction level than traditional RTL (Register Transfer Level), need the SW developments that drives them as well as the HW designs. Therefore, in recent development trend the Virtual Platform which is a SW model of a real HW (we will call it VP hereafter) is built as a system level model (ESL model) for architecture exploration, SW development, HW/SW co-verification, and system verification (whereas, traditional prototyping is a physical platform (we will call it PP hereafter)). VP can be also used as an executable specification, i.e. a golden reference model. As VP is made of at higher abstraction level, its development time is short. Also, it can be used to verify TB before DUV is available. VP also plays a critical role in platform-based design (PBS), which is widely adopted in SOC designs, because VP can be made of transaction-level on-chip bus models and other transaction-level component models (these are called TLM models), which can be simulated at much higher simulation speed (about 100 to 10,000 times faster than RTL model). Currently, there are many commercial tools for creating and executing VP, such as MaxSim from ARM, ConvergenSC from CoWare, Incisive from Cadence, VisualElite from Summit Design, VSP from Vast Systems Technology, SystemStudio from Synopsys, Platform Express from Mentor Graphics, VTOC from TenisonEDA, VSP from Carbon Design Systems, VirtualPlatform from Virutech, etc. Therefore, VP can provide many benefits in SOC designs. In SOC designs, as the most important factor of VP is its fast execution speed suitable to develop some softwares, it is modeled not at RTL using Verilog or VHDL, but at higher abstraction level such as transaction-level or algorithmic-level using SystemC or C/C++. The abstraction level, which is the most important concept in system-level designs, is the level of the representation detail of corresponding design object (explained in detail later). Digital systems can be classified into layout-level, transistor-level, gate-level, RTL, transaction-level, algorithmic-level, etc from the low level of abstraction to the high level of abstraction. That is, gate-level is a lower abstraction than RTL, RTL is a lower abstraction than transaction-level, and transaction-level is a lower abstraction than algorithmic-level. Therefore, if the abstraction level of a specific design object A is transaction-level and its abstraction level of a design object B refined from A is RTL, then it is defined design object A is at higher level of abstraction than design object B. Also, if a design object X has design objects A and C, and a design object Y has design objects B, which is a refined design object from A, and C, it is defined design object X is at higher level of abstraction than design object Y. Moreover, the accuracy of delay model determines the level of abstraction at same gate level or same RTL. That is, even though there are at same gate-level, the net-list with zero-delay model is at higher abstraction than the net-list with unit-delay model, and the net-list with unit-delay model is at higher abstraction than the net-list with full timing model using SDF (Standard Delay Format). Recent SOC designs can be thought as a progressive refinement process of an initial design object, which must be implemented as a chip eventually, from the initial abstraction level, e.g. transaction-level, to the final abstraction level, e.g. gate-level (refer FIG. 14). The core of design methodology using progressive refinement process is to refine the design blocks progressively existed inside a design object MODEL_DUV(HIGH) modeled at high level of abstraction so that a refined design object MODEL_DUV(LOW) modeled at low level of abstraction is obtained automatically (for example, through logic synthesis or high-level synthesis), manually, or by both. As a detailed example, in the refinement process of ESL to RTL, which is to get an implementable RTL model from an ESL model (this process is currently carried out by human, high-level synthesis, or both), the ESL model is MODEL_DUV(HIGH) and the implementable RTL model is MODEL_DUV(LOW), and in the refinement process of RTL to GL (Gate Level), which is to get a GL model, i.e. gate-level netlist, from an implementable RTL model (this process is currently carried out by logic synthesis), the RTL model is MODEL_DUV(HIGH) and the GL model is MODEL_DUV(LOW). The GL model can become a timing accurate GL model if the delay information in SDF (Standard Delay Format), which is extracted from the placement and routing, is back-annotated.
There is one thing to mention. It is not absolutely necessary for an ESL model that all design objects in the model are at system level. This is also true for a RTL model. In an ESL model, it is possible that a few design objects are at RTL and they are surrounded by the abstraction wrappers which make the abstraction of the RTL objects same as the other ESL objects. Also, in an RTL model, it is possible that a few design objects are at GL and they are surrounded by the abstraction wrappers which make the abstraction of the GL objects same as the other RTL objects. At the same reason, in a GL model a few design objects, e.g. memory block which is not produced a net-list at gate-level by logic synthesis, can be at RTL. Therefore, in this present invention “a model at the specific level of abstraction” is a model at any level of abstraction (not only ESL, RTL, and GL, but also any mixed levels of abstraction such as a mixed level of ESL/RTL, a mixed level of RTL/GL, a mixed level of ESL/RTL/GL, etc) that can be existed in a refinement process from ESL to GL. Also, the “abstraction level” includes not only ESL, RTL, and GL, but also any mixed levels of abstraction such as a mixed level of ESL/RTL, a mixed level of RTL/GL, a mixed level of ESL/RTL/GL, etc. For example, if a DUV consists of four design objects, A, B, C, and D, A and B are at ESL, C is at RTL, and D is at GL, the DUV is a mixed ESL/RTL/GL model of abstraction and can be called a model at the specific level of abstraction (Also, it is possible to be called a model at mixed ESL/RTL/GL of abstraction). From now on, we will call a model at mixed levels of abstraction if we must clearly mention that the model is represented at the mixed levels of abstraction (Arbitrary design object, such as DUV or TB, can be called a model, but if there is no specific mention, a model is defined as a design object including DUV (Design Under Verification) and TB (Testbench)).
Transaction, which is the most important concept at ESL, represents an information that is defined over logically related multiple signals or pins as a single unit, and uses function calls to communicate among design objects. By contrast, the information on the signals or pins at RTL is represented by bit or bit vector only. Transaction can be defined cycle by cycle (we'll call this type of transaction cycle-accurate transaction, and ca-transaction in short), over multiple cycles (we'll call this type of transaction timed transaction, cycle-count transaction, or PV-T transaction and timed-transaction in short), or without the concept of cycles (we'll call this type of transaction untimed-transaction in short). The timed-transaction is represented by Transaction_name (start_time, end_time, other_attributes. In fact, there is no standard definition about transaction, but it is mostly general to define and classify into untimed-transaction, timed-transaction, and ca-transaction explained above. Within the transaction, untimed-transaction is at the highest level of abstraction, but the least accurate in timing, and ca-transaction is at the lowest level of abstraction, but the most accurate in timing. Timed-transaction is at between.
The refinement process is incremental so that the design objects at TL (Transaction-level) in VP are progressively refined into the design objects at RTL which have at least signal-level cycle accuracy. At the end of the transformation, design objects at TL are translated into design objects ar RTL, therefore the transaction-level VP is refined into the implementable RTL model. Also, the design objects at RTL (Transaction-level) in the RTL model are progressively refined into the design objects at GL which have at least signal-level timing accuracy. At the end of the transformation, design objects at RTL are translated into design objects ar GL, therefore the RTL model is refined into an GL model. FIG. 14 shows the example of the refinement process explained above.
There are two objects to be designed in SOC designs, the first is DUV (Design Under Verification) and the second is TB (Testbench). DUV is the design entity that should be manufactured as chip, and TB is a SW model which represents an environment in which the chip is mounted and operated. TB is for simulating DUV. During the simulation, it is general TB provides stimuli to DUV, and processes the output from DUV. In general, DUV and TB has a hierarchy so that there may be one or more lower modules at inside, each of these lower module can be called design block. In a design block there may be one or more design modules inside, and a design module there may be one or more submodules inside. In this present invention, we will call any of design blocks, design modules, submodules, DUV, TB, some part of design blocks, design modules, submodules, DUV, or TB, or any combination of design blocks, design modules, submodules, DUV, and TB, “design object” (For example, any module or part of the module in Verilog is a design object, any entity or part of the entity in VHDL is a design object, or any sc_module or part of the sc_module in SystemC is a design object). Therefore, VP can be seen as a design object. So are the part of VP, one or more design blocks in VP, the part of a design block, some design modules in a design block, some submodules in a design module, the part of a design block, the part of a submodule, etc. (In short, entire DUV and TB, or some part of DUV and TB can be seen as design object).
In the design process using progressive refinement the simulation at high level of abstraction can be run fast, but the simulation at low level of abstraction is relatively slow. Therefore, the simulation speed decreases dramatically as the refinement process goes down to lower level of abstraction. Contrast to the conventional single simulation (in this present invention, the definition of single simulation includes not only using one simulators, but also using more than one simulators, e.g. using one Verilog simulator and one Vera simulator, and running these simulators on a single CPU), there is a distributed parallel simulation method using two or more simulators for increasing the simulation speed. The examples of the simulator are HDL (Hardware Description Language) simulators (such as NC-Verilog/Verilog-XL and X-sim from Cadence, VCS from Synopsys, ModelSim from Mentor, Riviera/Active-HDL from Aldec, FinSim from Fintronic, etc), HVL (Hardware Verification Language) simulators (such as e simulator from Cadence, Vera simulator from Synopsys, etc), SDL (System Description Language) simulators (e.g. SystemC simulator such as Incisive simulator from Cadence, etc), and ISS (Instruction-Set Simulator)(such as ARM RealView Development Suite Instruction Set Simulator, etc). For another classification, there are event-driven simulators or cycle-based simulator. The simulators in this present invention include any of these simulators. Therefore, when two or more simulators use in this present invention, each of simulators can be any of simulators mentioned above. Distributed parallel simulation (or parallel distributed simulation, or parallel simulation in short), which is to perform a simulation in a distributed processing environment, is the most general parallel simulation technique, in which DUV and TB, i.e. a model at specific level of abstraction, are partitioned into two or more design objects, and each of design objects is distributed into a simulator and executed on it (see FIG. 5). Therefore, the distributed parallel simulation requires the partitioning step at which divides a simulation model into two or more design objects. In this present invention, we will call the design object that should be executed in a specific local simulation (to be defined later) through the partition a “local design object”.
Recently, distributed parallel simulation can be possible by connecting two or more computers with a high speed computer network such as giga-bit ethernet and running a simulator on each computer, or using multiprocessor-computer which has two or more CPU cores (in this present invention, local simulation is the simulation executed by each of those simulators that is called a local simulator in the distributed parallel simulation). However, the performance of traditional distributed parallel simulation severely suffer from the communication and synchronization overhead among local simulators. Therefore, two basic methods are known for synchronization, one conservative (or pessimistic) the other optimistic. The conservative synchronization guarantees the causality relation among simulation events so that these is no need to roll-back, but the speed of distributed parallel simulation is dictated by the slowest local simulation and these is too much synchronizations. The optimistic synchronization temporally allows the violation of the causality relation, but corrects it later by roll-back so that the reduction of roll-backs is very critical for the simulation speed. But, because current distributed parallel simulation using optimistic synchronization does not consider to minimize the roll-back by maximizing the simulation periods when a local simulation does not require any synchronization with other local simulations, the simulation performance degrades significantly due to the excessive roll-backs. Distributed parallel simulation using conventional optimistic approach and one using conventional pessimistic approach are well known in many documents and papers, therefore the detailed explanation is omitted in this present invention. One more thing to mention is it is desirable to have same number of processors in a distributed parallel simulation as the number of local simulations for maximizing the simulation performance, but it is still possible to perform a distributed parallel simulation as long as there are two or more processors available even though the number of local simulation is larger than that of processors. In summary, the synchronization and communication methods for both optimistic approach and pessimistic approach greatly limit the performance of distributed parallel simulation using two or more simulators.
Moreover, during the progressive refinement process it is very important to maintain the model consistency between a model at high level of abstraction and a model at low level of abstraction because the model at high level of abstraction serves as a reference model for the model at low level of abstraction. However, in the current progressive refinement process there is no efficient method to maintain the model consistency between two models existing at two different abstraction levels.
Moreover, as there is no systematic method in the debugging process in which the design errors are identified and removed in the design process using the progressive refinement, the large amount of time must be consumed.