Instruction-set simulators are an integral part of a today's processor and software design process. Their important role within architecture exploration, early system verification, and pre-silicon software development phase is indisputable. The performance of the simulator is a key factor for the overall design efficiency. The flexibility and accuracy of the simulator are also key factors. One conventional instruction-set simulation technique is an interpretive technique, which is flexible but slow. A second conventional instruction-set simulation technique is a compiled technique, which is faster than interpretive simulation but lacks flexibility.
FIG. 1 illustrates an exemplary interpretive simulation workflow 150. An interpretive simulator is basically a virtual machine implemented in software, which interprets loaded object code to perform appropriate actions on a host to simulate actions of target hardware. First, the application 145 to be simulated is loaded into memory on the host computing device. In a similar fashion to the operation of the target hardware, an instruction word 152 is fetched from the program memory 154, decoded 160, and executed 170 at run-time (simulation loop), which enables the highest degree of simulation accuracy and flexibility. However, the straight-forward mapping of the hardware behavior to a software simulator has major disadvantages. Unlike in real hardware, instruction decoding is a very time consuming process in a software simulator, especially for today's VLIW architectures. Further, the growing complexity of new programmable architectures is making interpretive simulators more and more impractical.
Another conventional approach to instruction-set simulation is compiled simulation. Referring now to FIG. 2, the objective of compiled simulation 200 is to improve the simulation performance. Shifting time-consuming operations from the simulator run-time into an additional step before the simulation (compile-time) can make run-time simulation far more efficient than interpretive simulation. This step is performed by a tool called a simulation compiler 205, which compiles an application 145 to produce a compiled simulation 200. At run-time, the various instruction behaviors 220 are executed 225 on the host computer system.
Depending on architectural and application characteristics, the degree of compilation varies. All known compiled simulators have in common that a given application 145 is decoded at compile-time. Based on the results of the decoding phase, the simulation compiler 205 subsequently selects and sequences the appropriate host operations that are required to simulate the application 145. All known compiled simulators rely on the assumptions that the complete application 145 is known before the simulation starts and is also run-time static.
Thus, compiled simulation typically is far more efficient than interpreted simulation. However, a major restriction for the utilization of compiled simulators is the requirement for static program code. This limits the compiled technique to simulating a small class of applications. In contrast to typical DSP applications, which are signal-processing algorithms, micro-controller architectures usually run an operating system (OS). A significant characteristic of operating systems, run-time dynamic program code, conflicts with the limitation of compiled simulators. However, even for DSP architectures, real-time operating systems are increasingly gaining importance. Consequently, the class of devices for which conventional compiled simulation is suitable may be shrinking.
Thus, the integration of compiled simulators into embedded system design environments is not possible, since the prime requirement, predictable program code, is not fulfilled when using external program memories. Furthermore, applications with run-time dynamic program code, as provided by operating systems (OS), cannot be addressed by compiled simulators. However, today's embedded systems consist of multiple processor cores and peripherals, which make an underlying OS indispensable. Consequently, compiled simulators only allow the isolated simulation of applications, which is not sufficient for the verification of a complete hardware/software system.
Another area that is unsuitable for compiled simulators is multiple instruction-set architectures. Considering novel architectural features, especially in the domain of low power architectures, multiple instruction-sets are widely used to reduce power and memory consumption. These architectures can switch to a compressed instruction-set at run-time. For instance, the ARM core family provides a so-called “thumb” instruction-set. This dynamic instruction-set switching cannot be considered by a compiled simulator, since the selection depends on run-time values and is not predictable.
Still another area that is unsuitable for compiled simulators are large applications. This is because compiled simulation of large applications requires an enormous amount of memory, for example, 1000 times the requirements of an interpretive simulator, depending on the architecture. As long as the host memory is big enough, the high memory consumption may not have a severe impact on performance. However, for multi-processor simulation of embedded systems or processor arrays, the memory efficiency of the simulator becomes increasingly important.
Summarizing the above arguments, the enormous performance gain of compiled simulators succumbs to their restrictiveness. This implies that most application areas are still dominated by the slow interpretive technique. However, the ever-increasing complexity of applications, architectures, and systems requires higher performance.
Following is a brief discussion of some specific conventional techniques for implementing simulators and their limitations. One technique is based on the EXPRESSION language (see e.g., A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, and A. Nicolau, “EXPRESSION: A Language for Architecture Exploration through Compiler/Simulator Retargetability”, Proceedings of the Conference on Design, Automation & Test in Europe, 1999). This conventional simulator provides for a retargetable tool suite and allows cycle-true and bit-true modeling of pipelined processors. The technique may be suitable for modeling architectures such as the Motorola DSP 56k or Texas Instruments TMS320C6000™. However, this simulator is interpreted and hence has poor performance.
Another simulation technique is the EMBRA project, which is a compiled simulator (see, e.g., E. Witchel and M. Rosenblum, “Embra: Fast and Flexible Machine Simulation”, Proceedings of the Conference on Measurement and Modeling of Computer Systems, 1996). EMBRA maps instructions from the device to be simulated to instructions on the host machine and may provide a high performance simulator for the MIPS R3000/R4000 processor. However, this simulator is non-retargetable and restricted to the target device being a MIPS R3000/R4000 architecture and the host device being a Solaris™ machine.
Another conventional technique for a compiled simulator is retargetable, but is unable to simulate run-time dynamic code. The simulator generated from a FACILE description utilizes a fast forwarding technique to achieve reasonably high performance (see, e.g., E. Schnarr, M. D. Hill, and J. R. Larus, “Facile: A Language and Compiler For High-Performance Processor Simulators”, Proceedings of the International Conference on Programming Language Design and Implementation, 1998). Fast forwarding is similar to compiled simulation and uses result caching of processor actions, indexed by a processor configuration code. Previously cached actions can be replayed directly in a repeated occurrence of a configuration. However, due to the assumption that program code is run-time static, dynamic program code cannot be simulated with this technique.
Retargetable compiled simulators based on an architecture description languages have been developed within the Sim-nML (FSim), ISDL (XSSIM, and MIMOLA projects. (See e.g., M. Hartoog, J. A. Rowson, P. D. Reddy, S. Desai, D. D. Dunlop, E. A. Harcourt and N. Khullar, “Generation of Software Tools from Processor Descriptions for Hardware/Software Codesign”, Proceedings of the Design Automation Conference, 1997; G. Hadjiyiannis, S. Hanono, and S. Devadas, “ISDL: An Instruction Set Description Language for Retargetability”, Proceedings of the Design Automation Conference, 1997; and R. Leupers, J. Elste, and B. Landwehr, “Generation of Interpretive and Compiled Instruction Set Simulators”, Proceedings of the Asia South Pacific Design Automation Conference, 1999.) However, due to the simplicity of the underlying instruction sequencer, it is not possible to realize processor models with more complex pipeline control mechanisms like Texas Instruments TMS3206000™ at a cycle accurate level with these techniques.
A further retargetable approach is based on machine descriptions in ANSI C. (See, e.g., F. Engel, J. Nuhrenberg, and G. P. Fettweis, “A Generic Tool Set for Application Specific Processor Architectures”, Proceedings of the International Workshop on HW/SW Codesign, 1999). However, only results for a single proprietary DSP architecture are available so far. Moreover, all of the presented compiled simulation approaches are qualified by the limitations that result from the compiled principle as discussed above.
Therefore, it would be advantageous to provide a method and system for a simulator that combines retargetability, flexibility, and high simulation performance at the same time. It would be further advantageous to provide a method and system for a simulator that is suitable for run-time dynamic code. It would be still further advantageous to provide a method and system for a simulator that allows cycle-true modeling and bit-true modeling.