With the development of technology, instruction-set simulator (ISS) is an indispensable tool for system level design. A hardware designer is able to perform the exploration and/or verification by an instruction-set simulator before the realization of the design. As a result, it is able to decrease the non-recurring engineering cost (NRE cost) in product development. A software designer can test a program on an instruction-set simulator instead of running it on real target machines, and hence the turnaround time can be reduced.
After several years of development, the performance of the traditional instruction-set simulator integrated into a single core machine is nearly optimum (fast and accurate). However, as the evolution of semiconductor manufacturing processes, two or more processors can be encapsulated in a single chip. Traditional single-core systems have been gradually substituted by multi-core systems. In order to maximize multi-core efficiency, more and more applications or programs are developed by using parallel programming model; however, the instruction-set simulator of a traditional single core system cannot manage the different cores synchronously so that simulations by different cores are not executed efficiently.
In a multi-core system, a plurality of programs is simultaneously and synchronously performed. So far, multi-core instruction-set simulation (MCISS) is designed for the programs on multi-core systems. Generally, multi-core instruction-set simulation can be established by a plurality of instruction-set simulators; however, it might result in that the instruction-set simulators randomly being arranged to the idle host core.
Simulation time means that the time for performing the instruction-set simulators by a host core, and target time means that the actual time for the simulated programs performed in the target. The time points needing to be synchronized are named “sync point”, and each clock tick starts is a sync point. The instruction-set simulators need to stop at each sync point for the purpose of performing synchronization. Therefore, lock-step approach incurs overhead in synchronization.
As multi-core systems are gradually replacing single-core systems, the corresponding Multi-Core Instruction-Set Simulator (MCISS) is also becoming more crucial. Intuitively, to attain a MCISS, a single-core ISS can be used to simulate each target core and perform the co-simulation that runs all the ISSs in parallel to gain simulation performance.
Timing synchronization is used to keep timing consistency for ensuring accurate concurrent behaviors of multiple simulated components. An intuitive approach is to synchronize all components at every cycle. This approach is usually named the cyclebased or lock-step approach. Though it offers accurate simulation, however, the heavy synchronization overheads would significantly slow down the simulation. Enlarging synchronization intervals could certainly improve performance, but it would also result in inaccurate simulation.
In order to attain a fast and accurate co-simulation, partial order synchronization approaches are proposed. The idea is to maintain correct data flow, i.e., data dependency. In reality, programs can only influence each other via their shared memory accesses. As long as the temporal order of all the shared memory accesses is maintained, consistent data dependencies between programs will be obtained. To do so, timing synchronization is only required to perform at each shared memory access. Since the number of shared memory accesses is considerably smaller than the number of total execution cycles, light-weight synchronization efforts allow this shared memory based approach to be more efficient than the lock-step approach. Meanwhile, this approach can guarantee accurate MCISS simulation results.
Nevertheless, conventional co-simulation approaches such as SystemC usually adopts a centralized scheduler 100 to handle timing synchronization between each ISS, as illustrated in FIG. 1. In order to maintain timing consistency, centralized scheduling always selects the slowest ISS for execution. Even if it allows parallel simulation, only one ISS can actually be executed for most of the time. Therefore, this approach highly limits the degree of parallelism of a MCISS. Considering the fact that the number of cores to simulate continues to increase, it is necessary to leverage parallelism to gain better simulation performance from the computing power of a host multi-core machine.
The centralized scheduling mechanism can be either sequential or parallel. The difference is that the sequential version cooperatively executes the tasks, so only one task is executed at one time. On the contrary, in the parallel version, more than one task can execute in parallel.
Generally, multi-core instruction-set simulation (MCISS) should run in parallel to improve simulation performance. However, the conventional low-parallelism centralized scheduler greatly constrains simulation performance. To resolve this issue, a high-parallelism distributed scheduling mechanism for MCISS is proposed.