In a conventional shared-memory multicore processor system, coherence between caches has to be maintained if one of the central processing units (CPUs) updates shared data on a cache, to enable other CPUs to access the latest data updated from the shared data. Maintaining coherence between caches is called “cache coherency.” Snooping is one method for cache coherency.
In the snooping, an update of shared data is detected by monitoring, by a snoop controller, the state of lines of the cache and caches of other CPUs and exchanging information concerning update with the caches of the other CPUs. Upon detection of an update, each cache purges data before the update via a data bus or a snoop bus and caches the updated data via a data bus.
If multiple CPUs in a multicore processor issue access requests to a bus at the same time, an arbitration circuit determines a CPU of which access is permitted according to round robin, whereby a right to access the bus is given to the CPUs sequentially.
The arbitration circuit includes a request buffer, and permits access requests registered in the request buffer sequentially from the top. According to round robin, an access request concerning a process assigned to one CPU and the other CPUs is temporarily suspended when the time (i.e., time slice) allotted to the access request elapses, and input into the end of the request buffer.
A process, which is a unit of processing performed by an application, is classified into two categories, namely, a process representing a function, etc. (hereinafter, “coarse-granularity process”) and a process representing a loop process, etc. (hereinafter, “medium-granularity process”). The medium-granularity process is further classified into two categories, namely, a process having no dependency between loop iterations (hereinafter, “doall process”) and a process having a dependency between loop iterations (hereinafter, “doacross process”) (see, for example, Japanese Patent Publication No. 2008-217825 and Kasahara, Hironori. “Parallel Processing Technology,” CORONA PUBLISHING CO., LTD, Jun. 20, 1991, page. 131).
However, if a doacross process is divided into parallel processes and assigned to different CPUs, respectively, snooping has to be executed frequently due to the dependency between iterations, thereby increasing the number of times the bus is used.
On the other hand, if a doall process is divided into parallel processes and assigned to different CPUs, respectively, synchronization is taken at the end of the loop process since calculation can be independently done for each iteration. On the other hand, the coarse-granularity process requires no snooping due to iteration.
Conventionally, since the right to access the bus is given to the CPUs sequentially according to round robin, the doacross process cannot perform snooping via the bus while the doall process or the coarse-granularity process uses the bus, thereby increasing the execution time of the doacross process that frequently accesses the bus.