1. Technical Field
The present invention relates to a multiprocessor control unit, a control method performed by the same, and an integrated circuit, and specifically to a multiprocessor control unit for reducing the power consumption of a plurality of processors, a control method performed by the same, and an integrated circuit.
2. Background Art
A multiprocessor system for assigning a thread (or process) in a program to each of a plurality of processors and executing the threads in parallel performs barrier synchronization. The barrier synchronization blocks thread execution until all the threads arrive at a barrier point preset for each thread, and is preformed in order to prevent inconsistency in the order of reading or writing a variable which is commonly referred to by the threads. Hereinafter, “barrier start” means start of the barrier synchronization, “barrier arrival” means that a thread arrives at a barrier point, “barrier establishment” means that the barrier synchronization is established by the last thread arriving at the barrier point, and “barrier wait” means that a thread which has realized the barrier arrival is in a wait state until the barrier establishment.
A multiprocessor system for realizing barrier synchronization generally has a shared memory type structure, by which a plurality of processors share a single address space. A shared memory type multiprocessor system uses a lock variable provided in a main memory unit on a shared memory bus accessible from each processor to realize the barrier establishment. Specifically, the lock variable is first set to the number of threads of all the processors which are to perform parallel execution. A processor performs an atomic operation (an operation of exclusively occupying the shared memory bus to perform a series of read-modified-write operations) at the time of the barrier arrival of the thread, and accesses the main memory unit to decrement the lock variable by one. Then, the processor is put into the barrier wait. A processor in the barrier wait keeps on determining whether or not the lock variable has become zero by an atomic operation performed another processor, by loading the lock variable in repetition. Namely, the processor repeats loading the lock variable and making a determination on the lock variable until the barrier establishment, since the lock variable becoming zero means the barrier establishment. Such a state in which the processor repeats loading the lock variable and making a determination on the lock variable during the barrier wait is referred to as “spin waiting state”.
A processor in a spin waiting state is in a normal operation state in which the processor constantly accesses the main memory unit although not executing the thread assigned thereto. Therefore, the processor in a spin waiting state consumes power in waste.
A technology for reducing power consumption by transferring a processor in a spin waiting state into a non-operative sleep mode has been proposed. According to this technology, the power mode of a processor in a spin waiting state is transferred into a sleep mode, for example, by performing clock gating of blocking the supply of a clock signal to a logic circuit, by controlling the threshold voltage (Vth) provided against leak power which has recently increased as the semiconductor process is conducted in a progressively microscopic order, or by performing power gating of blocking the supply of a supply voltage (Vdd). For example, in an ARM processor produced by ARM of U.K., as shown in FIG. 37, a plurality of power modes into which the processor can be put are set. The power modes include a Run mode, which is a normal operation state mode (normal operation mode), and a Standby mode, a Dormant mode and a Shutdown mode which are non-operative sleep modes. The Standby mode, the Dormant mode and the Shutdown mode are different from one another in the manner of turning ON/OFF the clock signal and the voltage which are respectively supplied to a processor core and a cache memory.
The Dormant mode, to which power gating is applied, is a sleep mode which provides a larger power saving effect and is deeper than the Standby mode, to which only clock gating is applied. The Shutdown mode, in which the voltage to be supplied to the cache memory is off, is a sleep mode which provides a larger power saving effect and is deeper than the Dormant mode. In the Dormant mode, internal contexts such as a register and the like in the processor core need to be retracted to an external memory or the like from the processor core when the voltage is turned off, and need to be recovered to the processor core from the external memory or the like when the voltage is turned on. In addition, it requires a relatively long time to turn ON the voltage of the processor core. Therefore, the Dormant mode needs a larger time overhead than the Standby mode when transferred from the Run mode and recovered to the Run mode. The Shutdown mode needs a larger time overhead than the Dormant mode because in the case of the Shutdown mode, retraction and recovery of the context in the cache memory and the time for stabilizing the cache memory need to be considered. Therefore, a more power-saving (deeper) sleep mode cannot be applied unless the time duration to which the sleep mode is to be applied is sufficiently longer as compared with the time duration required for transfer from the normal operation mode and recovery to the normal operation mode.
In order to apply such a deeper sleep mode to a processor in the barrier wait as described above, the barrier waiting time needs to be estimated in advance. As a conventional technology for realizing this, a method of predicting the barrier waiting time based on the history and determining which depth of sleep mode is to be applied in accordance with the barrier waiting time which is predicted (predictive barrier waiting time) has been proposed (for example, Non-patent document 1, etc.). Specifically, when a thread has realized the barrier arrival, the power mode of the processor which has executed the thread is transferred from the normal operation mode to a sleep mode in accordance with the predictive barrier waiting time. Then, recovery to the normal operation mode is started at the timing obtained by calculating backwards from the predictive barrier waiting time, such that the power mode is recovered from the sleep mode to the normal operation mode by the time of the barrier establishment. According to the conventional art, a deeper sleep mode is applied to a processor in the barrier wait by predicting the barrier waiting time based on the history as described above. Non-patent document 1: J. Li, J. Martinez, M. Huang, “The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors”, In Proceedings of High-Performance Computer Architecture (HPCA), '04, IEEE Computer Society Washington, D.C., USA, United States of America, 2004, pp. 14-23.