Modern high performance microprocessors have an ever-increasing number of circuit elements and an ever-rising clock frequency. Also, as the number of circuits that can be used in a CPU has increased, the number of parallel operations performed by the circuits has risen. Examples of efforts to create more parallel operations include increased pipeline depth and an increase in the number of functional units in super-scalar and very-long-instruction-word architectures. As CPU performance continues to increase, the result has been a larger number of circuits switching at faster rates. Thus, from a circuit design perspective, important considerations such as the time needed to complete a circuit simulation and the time needed to debug the CPU are taken into account.
As each new CPU design uses more circuits and circuit elements, each often operating at increased frequencies, the time required to simulate the circuit design increases. Due to the increased time for simulation, the number of tests, and consequently the test coverage, may decrease. In general, the result has been a dramatic increase in the logic errors that escape detection before the CPU is manufactured.
Circuit simulation may occur at a xe2x80x9cswitch-level.xe2x80x9d Switch-level simulations typically include active circuit elements (e.g., transistors) and passive circuit elements (e.g., resistors, capacitors, and inductors). Circuit simulation also may occur at a xe2x80x9cbehavioral level.xe2x80x9d Behavioral level simulations typically use a hardware description language (HDL) that determines the functionality of a single circuit element or group of circuit elements.
A typical behavioral level simulation language is xe2x80x9cVerilog,xe2x80x9d which is an Institute of Electrical and Electronics Engineers standard. Verilog HDL uses a high-level programming language to describe the relationship between the input and output of one or more circuit elements. Verilog HDL describes on what conditions the outputs should be modified and what affect the inputs have. Verilog HDL programs may also be used for logic simulation at the xe2x80x9cregister transfer levelxe2x80x9d (RTL). RTL is a programming language used to describe a circuit design. The RTL programs written in Verilog go through a verification process. During this process, the Verilog design is parsed and checked for RTL style conformance by a style checker.
Using the Verilog HDL, for example, digital systems are described as a set of modules. Each module has a port interface, which defines the inputs and outputs for the module. The interface describes how the given module connects to other modules. Modules can represent elements of hardware ranging from simple gates to complete systems. Each module can be described as an interconnection of sub-modules, as a list of terminal elements, or a mixture of both. Terminal elements within a module can be described behaviorally, using traditional procedural programming language constructs such as xe2x80x9cifxe2x80x9d statements and assignments, and/or structurally as Verilog primitives. Verilog primitives include, for example, truth tables, Boolean gates, logic equation, pass transistors (switches), etc.
HDL simulations, written using HDL languages, may be event-driven or cycle-based. Event-driven simulators are designed to eliminate unnecessary gate simulations without introducing an unacceptable amount of additional testing. Event-driven simulators propagate a change in state from one set of circuit elements to another. Event-driven simulators may record relative timing information of the change in state so that timing and functional correctness may be verified. Event-driven simulators use event queues to order and schedule the events. Event-driven simulators process and settle all the active events in a time step before the simulator can move to the next time step.
Cycle-based simulators also simulate a change in state from one set of circuit elements to another; however, the state of an entire system is evaluated once each clock cycle. Cycle-based simulators are applicable to synchronous digital systems and may be used to verify the functional correctness of a digital design. Cycle-based simulators abstract away the timing details for all transactions that do not occur on a cycle boundary. Cycle-based simulators use algorithms that eliminate unnecessary calculations to achieve improved performance in verifying system functionality. Discrete component evaluations and re-evaluations are typically unnecessary upon the occurrence of every event.
Cycle-based simulators typically have enhanced performance. Depending on the particular options used, cycle-based simulators can offer five to ten times improvement in speed and one-fifth to one-third the memory utilization over conventional, event-driven simulators. Some cycle-based simulators also offer very fast compile times. For very large designs, the reduced memory requirements of cycle-based simulators allow a design team to simulate a design on almost every workstation on their network.
A typical simulation system (e.g., cycle-based simulator) is shown in FIG. 1. A simulation design source code (10), which includes, for example, Verilog files, clock files, etc., is an input into a simulation design compiler (12). The simulation design compiler (12) statically generates simulation design object code (14). A linker/loader (16) takes as input the simulation design object code (14) and a test vector object code (18), which is output from a stimulus compiler (20). Test vector source code (22) is input into the stimulus compiler (20).
The test vector object code (18) provides stimulus in the form of input signal values for the simulation which is run on the simulator (24). For example, if a particular module included in the simulation design object code (14) includes an AND gate, the test vector object code (18) may provide stimulus in the form of a signal value equal to xe2x80x9c1xe2x80x9d to be sent to a pin of the AND gate at a particular time. The test vector object code (18) may also include expected outputs for signal values stimuli.
The test vector object code (18) may include multiple test vectors. For example, a collective test vector may include a first test vector to test a first group of modules of the simulation design object code (14), and a second test vector to test a second group of modules of the simulation design object code (14).
Using the test vector (18) and the simulation design object code (14), the linker/loader (16) generates and loads an executable code (i.e., an executable program) into the memory of simulator (24), where the simulation is performed. Depending on implementation, the simulator may use typical, xe2x80x9cstandardxe2x80x9d computer architectures, such as may be found in a workstation, or may use other, xe2x80x9cnon-standardxe2x80x9d computer architectures, such as computer architectures developed specifically for simulation or specifically for verification of circuit design.
However, regardless of whether simulator architecture is standard or non-standard, certain common issues are typically of concern to circuit testers and/or simulation designers. One issue is the size of the executable code in connection with cache performance.
Typically, a CPU is able to process information (e.g., executable code) faster than the information can be accessed and transferred from the main memory to the CPU. To reduce the amount of time the CPU remains idle, a fast but typically expensive type of memory is used as a cache. Access times for the cache are typically substantially faster than access times for the main memory.
When information requested is not found in the cache, a xe2x80x9ccache missxe2x80x9d occurs. Conversely, if the information is found, there is a xe2x80x9ccache hit.xe2x80x9d When a simulation is running on the simulator, as cache misses increase, simulation performance may be affected. As size of the executable code increases, often, simulation performance concerns may arise. For example, as less of the executable code can be stored and accessed from the faster cache, more of the code is stored and accessed in the slower main memory. Furthermore, longer completion times may be a concern with larger executable files.
In general, in one aspect, the invention relates to a method for dynamically customizing object code for simulation. The method comprises obtaining a statically generated object (SGO) and a first test vector, segmenting the SGO with a marker node to generate a segmented SGO comprising a plurality of SGO segments, generating a first simulation profile using the segmented SGO and the first test vector, locating a first unexercised segment of the plurality of SGO segments using the first simulation profile, and generating a first reduced SGO by removing the first unexercised segment from the segmented SGO.
In general, in one aspect, the invention relates to a system for dynamically customizing a statically generated object (SGO) for a simulation. The system comprises a first test vector for stimulating the SGO, a marker node segmenting the SGO to generate a segmented SGO, a first set of directives generated from a first simulation profile using the first test vector and the segmented SGO, and a first reduced SGO generated by removing a first unexercised segment of the segmented SGO using the first set of directives.
In general, in one aspect, the invention relates to a system for dynamically customizing a statically generated object (SGO) for a simulation. The system comprises a first test vector for stimulating the SGO, a marker node segmenting the SGO to generate a segmented SGO, a first set of directives generated from a first simulation profile using the first test vector and the segmented SGO, a first reduced SGO generated by removing a first unexercised segment of the segmented SGO using the first set of directives, and a linker/loader configured to use the first set of directives to remove the first unexercised segment.
In general, in one aspect, the invention relates to a computer system for dynamically customizing object code for simulation. The computer system comprises a processor, a memory, and software instructions stored in the memory for enabling the computer system under control of the processor, to perform obtaining a statically generated object (SGO) and a test vector, segmenting the SGO with a marker node to generate a segmented SGO comprising a plurality of SGO segments, generating a simulation profile using the segmented SGO and the test vector, locating an unexercised segment of the plurality of SGO segments using the simulation profile, and generating a reduced SGO by removing the unexercised segment from the segmented SGO.
In general, in one aspect, the invention relates to an apparatus for dynamically customizing object code for simulation. The apparatus comprises means for obtaining a statically generated object (SGO) and a test vector, means for segmenting the SGO with a marker node to generate a segmented SGO comprising a plurality of SGO segments, means for generating a simulation profile using the segmented SGO and the test vector, means for locating an unexercised segment of the plurality of SGO segments using the simulation profile and means for generating a reduced SGO by removing the unexercised segment from the segmented SGO.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.