The present invention relates to simulating a multiprocessor system, and especially relates to a simulating system and method for simulating a multiprocessor system.
A cycle-accurate simulator is an important tool in evaluating the design alternatives of multiprocessor systems. As the number of processors increases, the conventional sequential simulation techniques show their drawbacks of extreme slow speeds. Parallel simulation techniques are natural extensions to the sequential simulation techniques for purpose of higher speeds. However, a challenge in parallel simulation is to ensure that memory accesses are performed in a globally consistent order, i.e., respective memory accesses are synchronized with the global progress (global time). For example, it is assumed that a parallel simulating system including host processors A and B is used to simulate behaviors of two processors a and b, wherein processor a is simulated by host processor A to write memory unit c, and processor b is simulated by host processor B to read memory unit c. Then the memory accesses must be synchronized in a globally consistent order, otherwise erroneous results will occur. Conventional solutions to this problem comprise:
1) Per-cycle synchronization (see David A. Penry, Daniel Fay, David Hodgdon, Ryan Wells, Graham Schelle, David I. August and Daniel A. Connors, “Exploiting Parallelism and Structure to Accelerate the Simulation of Chip Multi-processors”, Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture (HPCA), February 2006). In this technique, all the simulated processors are synchronized at beginning of each cycle. Since a cycle is the minimum time unit, the correctness can be guaranteed. However, the simulation costs are extremely high due to too fine granularity, hence considerably reducing the overall simulation speed.
2) Barrier synchronization (see M. Chidister and A. George, “Parallel Simulation of Chipmultiprocessor architectures”, ACM Transactions on Modeling and Computer Simulation, 12(3):176-200, July 2002). In this technique, all the simulated processors are synchronized every t time units, the total time of which must be less than the memory access latency to ensure the correctness. However, since the memory access latency is usually cycle-level, the synchronization costs are still high.
3) Memory access based on synchronization (see M. Chidister and A. George, “Parallel Simulation of Chipmultiprocessor architectures”, ACM Transactions on Modeling and Computer Simulation, 12(3):176-200, July 2002). In this technique, all the simulated processors are synchronized each time a memory access is to be performed. However, the statistics shows that 30% to 40% of all the instructions are memory access instructions. Therefore the time costs for synchronization is still high.
FIG. 2 shows the general functional structure of a conventional cycle-accurate simulator. As shown in FIG. 2, a cycle-accurate simulator 20 usually comprises a fetching module 21, a decoding module 22, an issuing module 23, a functional unit 24, a writing back module 25, a committing module 26, a memory management unit (MMU) 27 and a memory hierarchical structure 28. For example, the modules and units as shown in the cycle-accurate simulator 20 may be implemented in hardware and/or software. A multiprocessor system may be simulated in parallel through a time-shared or parallel architecture. An example of the conventional cycle-accurate simulator may be available from SimpleScalar LLC located at Ann Arbor, Mich., USA (www.simplescalar.com).
As compared to the cycle-accurate simulator, the function simulator is faster in speed due to less consideration on microcosmic architectural details, and is still able to achieve the same memory access effect. FIG. 1 shows the general functional structure of a conventional function simulator. As shown in FIG. 1, a function simulator 10 usually comprises a fetching module 11, a decoding module 12, an execution module 13, a committing module 14, a memory management unit (MMU) 15 and a memory hierarchical structure 16. For example, the modules and units as shown in the cycle-accurate simulator 10 may be implemented in hardware and/or software. A multiprocessor system may be simulated in parallel through a time-shared or parallel architecture. An example of the conventional cycle-accurate simulator may be available from SimpleScalar LLC located at Ann Arbor, Mich., USA (www.simplescalar.com).