1. Technical Field
This invention generally relates to the field of developing and modeling logic designs, and more specifically relates to simulating hardware designs in a hardware simulation accelerator having encoded instructions.
2. Background Art
Designing logic circuits, whether they be a single functional unit such as an adder or a memory unit, or a chip, or an entire computer system, is a process or flow in which an initial concept of a new product is transformed into a detailed blueprint. Understandably, detecting errors at early stages saves time and engineering resources, especially for complex logic designs. Computer-aided design (CAD) tools and electronic design automation (EDA) allow logic designers to create and model new designs and understand their complexity prior to production. Modeling and verifying logic designs with CADs, EDAs, and simulation tools significantly accelerate the design process and reduce the time to market, thereby offering a competitive advantage to those developers having the fastest and most accurate EDA tools. The typical logic design process describes the logic in Very high speed IC Hardware Description Language (VHDL) or other hardware description language, such as Verilog. A netlist describing the interconnection topology or connectivity is input into a simulation environment to verify the logic of the design.
Among the verification tools are simulators and hardware simulation accelerators that model the function and performance of hardware in software, or in hardware-accelerated software simulation systems, or in hardware emulation systems that use a combination of software and hardware to model a circuit or system design. Simulation is broadly defined as the creation of a model of the logic design which, if subjected to arbitrary stimuli, responds in a similar way to the manufactured and tested design. More specifically, the term simulation is typically used when such a model is implemented as a computer program. Simulation has long been a preferred method for verification of logical correctness of complex electronic circuit designs. In contrast, the term emulation is the creation of a model using programmable logic or field-programmable gate array (FPGA) devices including arrays of multiplexers, also called muxes. Simulation and emulation enable designers to detect design errors before the expensive manufacturing process is undertaken. One advantage of emulation over simulation is speed but emulation may lack access to internal nodes needed for detailed analysis. Simulation acceleration using a special purpose hardware simulation accelerator offers the advantages of software simulation and increased speed of emulation.
Hardware simulation accelerators were developed to provide the massive simulation horsepower needed to verify huge and complex logic circuits, such as large parallel and/or pipelined processors with multiple levels of memory caches and many processing registers. One robust hardware simulation accelerator, the Engineering Verification Engine (EVE) used a massive network of Boolean function processors that were each loaded with up to 8192 logic instructions. Typically, each run through a sequence of all instructions in all logic processors in parallel constituted one machine cycle, thereby implementing the cycle-based simulation paradigm. The theoretical speed of EVE was many orders of magnitude faster than any software implementation, up to 2.2 billion gate evaluations per second and EVE's value to a given project was determined by the design's intended throughput in cycles per second (cps). A multiprocessor model with the full storage hierarchy and input-output (I/O) boards achieved between 250 cps and 1000 cps on EVE compared with 0.5 cps for the software model run on a mainframe.
In the late 1990s, IBM built the AWAN hardware simulation accelerator as a low-cost system with improved capacity and performance. AWAN had smaller and faster components and an interconnection strategy that was significantly improved over EVE. Models of integrated circuits exceeding 31 million gates were simulated wherein the simulation speed depended on the configuration, model size, model complexity, and the amount of host interaction. The raw model performance of the POWER4 chip running on Awan exceeded 2500 cycles per second. Utilizing the basic EVE concepts, a hyper-acceleration and emulation machine called ET3 was developed in CMOS technology. ET3 used logic processors which evaluated three- and four-way input gates. ET3 had a larger number of processors and a lower depth of sequential three-way-gate instructions per processor, 256 versus 8k in EVE or 128k in AWAN. The higher degree of parallelization resulting in dramatically higher speeds of 50,000 to 1M cps but at a much higher hardware price.
So, generations of hardware simulation accelerators have been developed and are now parallel computers with fields of application specific integrated circuit (ASIC) chips. The flattened netlist of the design under test is mapped to these fields of chips. The segment of the netlist that is mapped to a given simulator chip is stored in a compiled format in the instruction memory. The instruction memory is often large and located on the same chip as the processor. During simulation, rows of the instruction memory are read sequentially and piped to a logic evaluation unit, such as a processor. Based on the received instructions, a logic evaluation unit simulates the represented segment of the netlist. In most hardware simulation accelerators, each logic evaluation unit has a dedicated instruction memory that supplies the instruction stream to its respective logic evaluation unit. For modern accelerators such as the AWANNG, the instruction memory is located on the chip and often takes up half of the die's area. The capacity of hardware simulation accelerators is determined largely by the size of the instruction memory. For AWANNG, the architecture was designed to provide sufficient routing resources to achieve optimal gate utilization.
With reference to FIG. 1, a typical instruction 100 is shown. In a prior art instruction 100, the width of the instruction matches the exact number of bits 102 necessary to program all the resources such that there exists a one-to-one mapping of groups of bits to corresponding hardware resources programmed by these bits. By way of example, a group of bits 110 is required for gate evaluation of the logic at hardware resource G1; another field of bits 112 is required to program gate evaluation at hardware resource G2; a number of bits 118 is used to program gate evaluation at hardware resource Gn, and so on. Thus, for each hardware resource, there are a number of bits. Similarly for routing, a fixed number of bits 120 will be used for routing to/from hardware resource R1; a fixed number of bits are used in field 126 to route instructions/logic to/from hardware resource Rm. Another example is that a particular set of 16 instruction memory bits is always be used to program a 16 bit function table. Thus, the total number of bits needed to program all the gate evaluations and routing resources determine the required total width of the instruction memory. In the prior art instruction of FIG. 1, each memory instruction programs all the gate evaluations and routing resources even though a hardware resource may not be needed to execute that instruction. Now, in general, the overall capacity of a simulator is measured in the number of possible gate evaluations. When there are insufficient routing resources in the instructions, the instructions become shorter and there is room for more instructions in total. The overall capacity of the simulator, however, is reduced because routing becomes a bottleneck; many of the instructions cannot be used for gate evaluations because the gate inputs are not available yet. If too many routing resources are added to the architecture, the instructions are wider than necessary and thus fewer instructions will fit in the memory. Given the typical case of one gate evaluation per instruction, this also reduces the overall capacity; thus in this case, gate evaluations become the bottleneck and some memory bits that are available for routing remain unused. Balancing the number of bits in an instruction between gate evaluations and routing resources is difficult to achieve. In fact, how much routing is needed depends on the characteristics of the logic design under test. Therefore there will always be instructions having a fixed designation of bits wherein some bits needed for gate evaluation and/or for routing are unused. The inventors understood that these unused bits in a memory instruction for a hardware simulation accelerator constitute wasted overall capacity.