Now that a huge number of transistors have come to be integrated in devices due to the progress of fabricating techniques of semiconductors, as well as refining techniques of circuit elements, higher frequencies have also come to be employed for the synchronization signals of processors. At the same time, problems have arisen in the conventional processors; such device performance achieved by the improvement of both the operation frequency and the logical method has seemingly come to the limits, mainly because of the increase of the power consumption during operation and furthermore because of the increase of the stand-by power caused by the leakage current in the state of stand-by in those devices. On the other hand, there have appeared many digital consumer appliances, such as navigation systems of vehicles, portable phones, digital TV sets that handle various types of data, such as images, voices, database information, etc. And in order to cope with the processings of mass of such data having different characteristics quickly and with lower power consumption, techniques that can solve the problems are strongly demanded now. One of the methods to meet such demands has introduced as a means, which realizes both performance improvement and low power consumption. For example, one of such expected means is a multiprocessor system capable of realizing high performance of computation by integrating plural processor units on a single chip and enabling those processor units to execute plural processings in parallel although only one processor unit has been mounted on one chip conventionally. In this case, the present operation frequency can be used as is. In the near future, it is expected that the refining technique of circuit elements will further be advanced, thereby 100 to 1000 processor units (operation units) will come to be mounted on one chip.
In case of a system preferred particularly to built-in devices, standardized digital signals such as radio, image, and voice signals are often processed. And in order to cope with such digital processings and to provide a means that enables both performance improvement and low power consumption to stand together, there is proposed a homogeneous multiprocessor system in which plural general processing devices are integrated. Those processing devices use the same general processing method, that is, the same instruction set and those devices are the same in configuration and operation performance characteristics. In addition to the homogeneous multiprocessor system, there is also proposed a heterogeneous multiprocessor system intended to improve the operation efficiency particularly for some kinds of application programs by integrating various types of such processing devices as dedicated processors and accelerators, each of which is capable of executing a specific processing very efficiently (at a high speed and at low power consumption) on a single chip. The instruction set differs among those processing devices.
In case of such a multiprocessor system that integrates plural processor units (PU) on one chip as described above, it is required to create its programs so that the plurality of PUs can operate simultaneously and efficiently to bring out the system performance fully. And in case of ordinary input programs, processings are described one by one in the time series, so that the arithmetic performance cannot be brought out as expected for the number of integrated PUs. If the parallelization property is taken into consideration particularly upon creating a program, one of the effective methods to solve the above problems is addition of parallelizing codes to the program so as to enable the program to be executed in plural PUs in accordance with the configuration of the subject multiprocessor system that executes the program. In this case, the method is effective for systems in each of which several PUs are integrated. And if the development time and effective performance of the system is taken into consideration, however, the method is not so practical for systems in each of which several tens to several thousands of PUs are integrated, more particularly not practical for systems in each of which different types of PUs are integrated. JP-A-2006-293768 discloses a static PU power control method that solves this problem. According to the method, a program is divided into plural tasks by a parallelizing compiler, the divided tasks are analyzed in parallel and scheduled to be distributed to plural PUs in parallel at the time of compiling. And according to the result of the parallel scheduling, the power of each PU is controlled statically.
There are also other device techniques that can realize the above described processor. The main stream of such device techniques has been a CMOS (Complementary Metal-Oxide-Semiconductor) technique and as described above, the performance of transistors has been improved by increasing the number of transistors to be mounted in accordance with the transistor scaling and by improving the clock frequency by quickening the switching speed of those transistors. However, for the reasons described above and due to such problems as leakage current, etc., it is now difficult to furthermore expect improvement of the refining technique with any conventional methods. In such a situation, there has been proposed a SOI (Silicon on Insulator) structure that forms a silicon crystal layer on an insulated substrate and forms transistors on the silicon layer. The SOI structure is expected to realize faster devices while suppressing the leakage current. In addition, this SOI structure can suppress the substrate capacity and quicken the switching speed of transistors more, thereby making it easier to control the body-bias level (body-bias control), which has been difficult on conventional silicon substrates. And the technique can realize both faster operations and low power consumption. SOI-structured transistors are divided into two types; fully-depleted MOS transistors (FD-SOI/MOS transistors) having a silicon layer formed on an insulation film respectively and partially-depleted transistors (PD-SOI/MOS transistors) having a thick silicon layer respectively. In case of the fully-depleted MOS transistors, the channel region (body part) is fully depleted during operation and the charge that forms a channel (inverted layer) can move free of the influence of the substrate, thereby the influence by the gate voltage on the charge of the inverted layer becomes more dominant and comes to have more favorable subthreshold characteristics than that of the partially-depleted structure, etc.
Particularly, in case of the bulk CMOS that has not employed the conventional SOI structure, the following problems have arisen; the substrate requires a large capacity; a voltage generation circuit with a stronger driving power is required to apply a voltage to the substrate; the body-bias level control time is long; an excessive voltage often flows due to a latch-up phenomenon caused by the transistor elements that are not separated from each another, thereby noise is generated from the substrate. Consequently, it has not been practical to apply the body-bias level in the positive direction (in which the threshold voltage is minimized) and to switch the body-bias level quickly. In spite of this, employment of the SOI technique can realize faster body-bias controlling to cope with various particle sizes.
JP-A-2007-042730 discloses a method that realizes a semiconductor device capable of fast operations with low power consumption in a wide range of ambient temperatures. The method uses FD-SOI transistors having a thin buried oxide layer respectively, uses the lower semiconductor region of the thin buried oxide layer as a back gate, and controls the voltage of the back gate from outside for each light load logic circuit in the logical circuit block each in its proper way according to the operation characteristic of the circuit at the timing of the block activation.
Furthermore, JP-A-2002-304232 discloses a power control method employed for semiconductor circuits. The method controls the clock frequency/supply voltage. Concretely, upon executing a task having sufficient processing time, the method lowers both the clock frequency and the supply voltage to reduce the device power consumption. In this case, however, the voltage change control time is very long and a power supply circuit with a high driving power is required to drive the power supply, thereby the overhead becomes large with respect to the area. However, the method has been confronted with some problems. And as devices have been refined in structure more and more in recent years, the supply voltage is also scaled. For example, in case of the CMOS in the 65 nm generation, the supply voltage is lowered up to 1.0 V. Consequently, the supply voltage can be lowered stably within 0.8 to 0.9 V. This makes it expect that process refining is more advanced, for example, up to 45 nm and under, so that the voltage is lowered, almost to zero. Thus it becomes difficult to employ any of the conventional voltage lowering methods here. Furthermore, along with the progress of the refining technique, it is required not only to lower the supply voltage, but also to combine plural methods such as body-bias controlling to reduce the device power consumption while the charging/discharging current that is a switching current, has been dominant in the power consumption. This is to prevent the problem that various types of leakage currents flow constantly in switches.