The present invention relates to computer software and hardware simulators, and more specifically, to a system and method to simulate an electronic system that includes one or more target processors executing software and interacting with hardware.
Computer simulation of digital hardware systems has become a common technique to reduce the cost and time required for the design of such hardware systems. Simulating digital hardware allows a designer to predict the functioning and performance of the hardware prior to fabricating the hardware.
More and more digital systems incorporate a processor, including a microprocessor, a digital signal processor, or other special purpose computer processor. There has been increased effort to develop a simulation system that includes simulating the hardware and simulating the running of software on one or more processors that are included in the digital system. Having such a simulation system allows a designer to test the operation of software on the processor(s) before a physical processor is available. Thus, for example, a designer may be able to start designing a system incorporating a new microprocessor before the manufacturer actually releases physical samples of the microprocessor. In addition, a system designer designing an integrated circuit or a system on a printed circuit board that includes a processor can, for example, use the simulation system to test the integrated circuit or printed circuit board implementation, including operation of software on the processor part, and any testing interactions between the processor and the other digital circuit elements of the integrated circuit or board, before the integrated circuit or board is fabricated. This clearly can save time and money.
A simulation system for simulating both the digital hardware that includes one or more target processors and the running of software on the processor(s) is called a co-simulation design system, a co-simulation system, or simply a design system herein, and the environment for operating such a co-simulation system is called a design environment. The processor is called a target processor and the computer system on which the environment operates is called the host computer system or simply the host. The host computer system includes one or more host processors. The hardware other than the target processor is called digital circuitry. The computer software program that is designed by a user to operate on the target processor is called the user program or the target code.
The target processor typically includes memory and one or more caches, for example a data cache (or D-cache) and an instruction cache (or I-cache). The target processor typically may also include a memory management unit (MMU) that converts virtual addresses into physical memory addresses and possibly physical I/O device addresses. The MMU may include a translation lookaside buffer (TLB) to improve address translation performance. A TLB is a hardware element that acts as a cache of recent translations and stores virtual memory page to physical memory page translations. Given a memory address (an instruction to fetch, or data to load or store), the target processor first looks in the TLB to determine if the mapping of virtual page to physical page is already known. If so (a xe2x80x9cTLB Hitxe2x80x9d), the translation can be done quickly. But if the mapping is not in the TLB (a xe2x80x9cTLB Missxe2x80x9d), the correct translation needs to be determined.
The target processor may be a separate microprocessor with the digital circuitry being external to the microprocessor (e.g., on a printed circuit board or elsewhere in the system), or may be a processor embedded in an application specific integrated circuit (ASIC) or a custom integrated circuit (IC) such as a very large scale integrated (VLSI) device, with the digital circuitry including some components that are part of the ASIC or IC, and other components that are external to the ASIC or IC.
The host processor also includes memory, and the host memory is referred to as xe2x80x9chost memoryxe2x80x9d herein. The physical address of the host memory is referred to as the xe2x80x9chost addressxe2x80x9d herein. When the word xe2x80x9caddressxe2x80x9d is used without specifying the host, then it refers to the target address.
A design environment capable of co-simulation requires the capability of accurately simulating the digital circuitry, including timing, and the capability of accurately simulating the running of the user program (i.e., the target code) on the target processor, including the accurate timing of operation of the user program and of any software/hardware interaction. The first requirement is available today in a range of simulation environments using hardware description languages (HDLs) such as Verilog and VHDL. It also is available as a set of constructed libraries and classes that allows the modeling of hardware in a higher-level language such as xe2x80x98Cxe2x80x99 or xe2x80x98C++.xe2x80x99 The second requirement pertains to a processor simulator using an executable processor model that both accurately simulates the execution of a user program on the target processor, and can interact with the digital circuitry simulation environment. Such a processor simulator should provide timing information, particularly at times of software/hardware interaction, i.e., at the software/hardware interface. A processor model that includes such accurate timing information is called a xe2x80x9cquantifiablexe2x80x9d model herein.
One known way of providing such processor simulation is to simulate the actual hardware design of the processor, for example by specifying a processor model in a hardware description language (HDL). The main but great disadvantage of so simulating the operation of the processor is the slow execution speed, typically in the range of 0.1-100 instructions per second.
Another known way of accurately simulating the execution of software on a processor for inclusion in a co-simulation environment is an instruction set simulator (ISS), wherein both the function and the sequencing of the microprocessor is mimicked in software. An instruction set simulator still executes relatively slowly, compared for example to how fast a program would be executing on the target processor. An ISS executes in the range of 1,000 to 50,000 instructions per second depending on the level of timing and operational detail provided by the model.
Real systems execute 50-1000 million instructions per second or more, so that the ISS or full hardware simulation techniques have a disparity of a factor between about 10,000 to 200,000 in performance; 3 to 60 hours of simulation may be needed to model 1 second of real-time target processor performance.
One solution to the slow speed of simulating a processor is to use a hardware processor model. This device includes a physical microprocessor and some circuitry for interfacing and interacting with the design environment simulating the digital circuitry. The memory for the target processor is simulated as part of the digital circuitry. Such an approach is fairly expensive. Another limitation is due to having two definitions of time operating on the same simulation system: simulation time of a hardware simulator, and processor time, which is real time for the hardware processor. Correlating these is difficult.
Another solution is to use an emulator as the target processor model. An emulator, like a hardware processor model, is a hardware device, typically the target processor, and usually includes some memory. The emulator is designed to emulate the operation of the microprocessor. Such a processor emulator when it includes memory can execute the user program directly, but again is expensive and may require the development of external circuitry to interact with the hardware simulator simulating the digital circuitry. U.S. Pat. No. 5,838,948 describes an environment that uses an emulator for speeding up the running of a user program in the design environment.
While sometimes it is desired to run a simulation with great precision at a high level of detail, at other times, less detail may suffice, enabling faster execution of the simulation. There therefore is a need in the art for an executable and quantifiable processor model that can be used in a co-simulation system and that models the operation of the target processor at an elected level of detail, including an elected level of detail at the hardware/software interface.
Computer networks are becoming ubiquitous, and it is desired to be able to operate a co-simulation design system on a computer network, with different elements of the design system running on different processors of the computer network to speed execution. Similarly, multiprocessor computers are also becoming commonplace, and it would be desirable to be able to operate a co-simulation design system on a multiprocessor computer, with different elements running on different processors of the multiprocessor computer.
Electronic systems nowadays may include more than one target processor. It is therefore desirable to have a co-simulation design system that provides for rapidly simulating such an electronic system, including simulating respective user programs executing on the target processors, such processor simulation providing timing detail that takes into account instruction timing and pipeline effects for target processors that include a pipeline.
The Parent Applications describe a method and system for rapidly simulating a target processor executing a user program. Described is a processor model for the target processor that operates up to the host processor speed on a host computer system and yet takes into account instruction timing and pipeline effects such as pipeline hazards and/or cache effects such as cache misses. The model can be incorporated into a design system that simulates an electronic circuit that includes the target processor and digital circuitry. The Parent Applications also describe using more than one such processor models in a design system that simulates an electronic circuit that includes more than one target processor and digital circuitry. A further feature described in the Parent Applications is how a user can modify the processor model to include more or less detail.
The Parent Applications"" design system operates on a host computer system and simulates an electronic system that contains target digital circuitry and a target processor. The design system includes a hardware simulator simulating the target digital circuitry, a processor simulator simulating the target processor by executing a user program substantially on the host computer system, and an interface mechanism that couples the hardware simulator with the processor simulator including passing information between the hardware simulator and the processor simulator. The hardware processor provides a simulation time-frame for the design system.
In one version, at significant events including events that require the user program to interact with the target digital circuitry, the operation of the processor simulator is suspended and associated event information is passed from the processor simulator to the hardware simulator. The operation of the processor simulator then is resumed when the hardware simulator processes information and passes an event result back to the processor simulator.
The processor simulator described in the Parent Applications accumulates a simulation time delay when operating, the simulation time delay determined using timing information that accounts for instruction timing including pipeline effects and/or cache effects. A static analysis process is performed on the user program, i.e., a process obtained by analyzing the user program prior to running the analyzed version of the user program on the processor simulator, determines timing information in accordance to characteristics of the target processor including instruction timing characteristics and, in one aspect, pipeline characteristics. The static analysis process comprises decomposing the user program into linear blocks of one or more instructions; determining the time delay for each linear block of the user program using characteristics of the target processor; and combining the linear block timing information with the user program to determine the timing information for the processor simulator.
Any timing effects, such as cache misses in a D-cache or an I-cache, are dependent on the current state of the cache, and cannot be known until runtime. Static analysis cannot easily account for such timing. Another aspect of the Parent Applications is dynamic analysis by including code to simulate a cache or other dynamic components. In one aspect of the Parent Applications, the processor simulator includes a cache simulator that simulates operation of the cache to account for the effects of cache misses on timing.
The hardware simulator provides a simulation time-frame for the design system. In one version, at significant events, including events that require the user program to interact with the target digital circuitry, the operation of the processor simulator is suspended and associated event information is passed from the processor simulator to the hardware simulator. The operation of the processor simulator then is resumed when the hardware simulator processes information and passes an event result back to the processor simulator.
The static analysis process comprises decomposing the user program into linear blocks of one or more instructions; determining, using characteristics of the target processor, the time delay for each linear block of the user program that would be incurred by executing the linear block with no cache misses, and combining the linear block timing information with the user program to determine the timing information for the processor simulator. In the case that the processor model includes a cache simulator, the analysis process also includes determining those parts of the user program that include one or more references that might require a cache lookup, and inserting hooks into the analyzed user program to invoke, at run time, the cache simulator for the references that includes a memory reference that requires a cache lookup in order to account for cache misses in timing.
In one embodiment, the hardware simulator runs on a HDL and at least some of the digital circuitry is specified in the HDL. In another embodiment, all or some of the digital circuitry is described to the hardware simulator in a higher-level language such as such as xe2x80x98Cxe2x80x99 or xe2x80x98C++.xe2x80x99One implementation described in the Parent Applications is when the user program includes statements in a higher-level language. An alternate implementation for which the present invention is particularly applicable is the case that the user program is provided as executable (binary) code in the target processor""s machine language.
Other features and aspects of the invention will become clear from the detailed description that follows.
One aspect of the invention is a method and system for rapidly simulating on a host computer system an electronic system that includes both digital circuitry and one or more target processors each executing a user program, with the target processor including a cache or an MMU or both. One feature of the invention is providing a processor model for each target processor that operates fastxe2x80x94potentially even faster than the target processor speedxe2x80x94and yet takes into account instruction timing and cache effects when a cache is included and MMU effects when an MMU is included. As an additional feature, the processor model also takes into account pipeline effects such as pipeline hazards for target processors that have a pipeline. Another feature of the invention is providing such a processor model that is modifiable by a user to include more or less detail. Another feature of the invention is providing such a processor model that can be incorporated into a design system that simulates an electronic circuit that includes the target processor and digital circuitry.
Described herein is a co-simulation design system to simulate on a host processor an electronic system that includes target digital circuitry and a target processor with an accompanying user program. The system includes a processor simulator to simulate execution of the user program by executing host software that includes an analyzed version of the user program. The system includes a hardware simulator to simulate the target digital circuitry and an interface mechanism that couples the hardware simulator with the processor simulator including controlling communication between the processor simulator and the hardware simulator.
The user program is provided in binary form. Determining the analyzed version of the user program includes decomposing the user program into linear blocks, translating each linear block of the user program into host code that simulates the operations of the linear block, storing the host code of each linear block in a host code buffer for the linear block, and adding timing information into the code in the host code buffer on the time it would take for the target processor to execute the user program. The timing information incorporates target processor instruction timing. The adding of timing information includes inserting dynamic hooks into the corresponding host code that during execution invoke dynamic mechanisms that may effect timing and that cannot be determined ahead of execution such that while the processor simulator executes the analyzed version of the user program, the processor simulator accumulates simulation time according to a simulation time frame, the accumulated simulation time accounting for the target processor instruction timing as if the user program was executing on the target processor.
In one version, the target processor includes a cache and the processor simulator includes a cache simulator. Determining the analyzed version of the user program further includes, for each linear block, identifying those parts in the linear block of the user program that include one or more memory references that might require a cache lookup, and inserting hooks into the corresponding host code in the corresponding host code buffer to invoke, at run time, the cache simulator for any simulated memory reference that might require a cache lookup. Executing the analyzed version of the user program causes the cache simulator to be invoked for the memory references, the cache simulator accounting for the effect of cache lookups on timing, and accumulates simulation time as if the user program was executing on the target processor, the accumulated simulation time also accounting for cache lookup effects.
The cache simulator includes a simulated cache containing simulated cache entries, and a cache search mechanism for searching the simulated cache for an entry that matches an address. One embodiment of the cache search mechanism includes a multi-level lookup table search mechanism that requires the same (small) number of host processor operations independent of whether the lookup is successful or not. This avoids tests that might slow down the simulation in the host code that implements the search mechanism.
The cache simulator stores, at execution time, for an instruction that might require a cache lookup, a pointer to the simulated cache entry that results from a lookup of the simulated cache the first time the execution of the target instruction is simulated such that the cache simulator can avoid looking up the simulated cache the next time the target instruction is executed in simulation.
One version is for a target processor that includes an MMU for translating virtual addresses to physical addresses. For this version, the processor simulator includes an MMU simulator, and determining the analyzed version of the user program further includes, for each linear block, identifying those parts in the linear block of the user program that include one or more memory references that might require accessing the MMU, and inserting hooks into the corresponding host code in the corresponding host code buffer to invoke, at run time, the MMU simulator for any simulated memory reference that might require an MMU access. Executing the analyzed version of the user program causes the MMU simulator to be invoked for the memory references, the MMU simulator accounting for the effect of MMU accesses on timing, and accumulates simulation time as if the user program was executing on the target processor, the accumulated simulation time also accounting for MMU access effects.
The MMU includes a TLB and the MMU simulator includes a TLB simulator containing simulated TLB entries and a TLB search mechanism for searching the simulated TLB for an entry that matches a virtual address and a page size. The TLB search mechanism includes a multi-level lookup table search mechanism that requires the same number of table lookups, and thus host processor operations independent of whether the lookup is successful or not.
The TLB simulator stores at execution time, for an instruction that might include accessing the MMU, a pointer to the simulated TLB entry that results from a lookup of the simulated TLB the first time execution of the target instruction is simulated such that the TLB simulator can avoid looking up the simulated TLB the next time the target instruction is executed in simulation.
The attributes of a TLB entry are encoded in the corresponding simulated TLB entry such that the testing of whether or not a virtual address and a page size match an entry of the simulated TLB automatically also checks permission without a separate permissions check required. In one version, the simulated TLB entry is encoded such that the address and page size match test fails when there is no permission or the alignment is incorrect. The simulated TLB entry includes a set of versions of the TLB entry virtual address and TLB entry page size information, each version corresponding to a different mode, and wherein testing a virtual address and page size pair for a match in a simulated TLB entry automatically also includes indexing to the version of the TLB entry virtual address and TLB entry page according to the mode.
Running the processor simulator includes, for each host code buffer, looking up the next host code buffer to execute and executing the code in the next code buffers in sequence. The looking up the next host code buffer at the conclusion of processing the code in a present host code buffer includes searching for the next host code buffer the first time the present host code buffer is processed, and, after the next host code buffer is found or newly created, storing for the present host code buffer a pointer to the next host code buffer such that the search for the next host code buffer can be avoided the next time the present host code buffer is processed and its next host code buffer is looked up.
Other features and aspects of the invention will become clear from the detailed description that follows.