Device miniaturization due to a progress in a semiconductor manufacturing technology has enabled integration of a huge number of transistors. Simultaneously, a progress has been made in achieving higher operation frequencies of a processor. However, an increase of operation power and an increase of standby power caused by a leakage current, the limit has begun to be seen in performance improvement of a processor which has been achieved conventionally by increasing operation frequency and improving a logical system.
Therefore, at present, as means for improving performance and attaining lower power consumption, a multiprocessor system (i.e., a single chip multiprocessor system) appears promising. In the multiprocessor system, a plurality of processor units (hereinafter, referred to as PU) such as conventional CPU and digital signal processor are mounted on a chip and operated in parallel to thereby obtain high arithmetic operation performance without increasing an operation frequency processes in parallel. In the future, it is expected that a further progress in miniaturization will enable 100 to 1000 PUs to be mounted on a chip.
In such a multiprocessor system, to obtain arithmetic operation performance proportional to the number of PUs, the mounted PUs must be simultaneously operated to process programs. However, descriptions of program manipulation are generally made in time sequence, which hinders to attain the arithmetic operation performance expected to be in proportion to the number of PUs, despite the plurality of mounted PUs.
In order to solve the above-mentioned problem, there is a method in which a program developer himself rewrites original programs by adding parallelization codes to the programs, with consideration given to parallelism of the programs for executing the programs on the plurality of PUs, based on the configuration of the multiprocessor system in which the programs are to be executed. This method is useful for a system which includes only a few PUs, however, this method is not practical in a case of a future system which has several tens to several thousands of PUs mounted therein, especially when the PUs are of different types, in terms of development time or effective performance.
Accordingly, studies have already been made on an automatic parallelization compiler, for use in a multiprocessor system constituted of a plurality of PUs similar in configuration and arithmetic operation performance, which analyzes input programs, extracts parallely operable parts from the programs, and allocates these parts to a plurality of PUs for simultaneous execution. For example, JP 2004-252728 A discloses a compilation system in which an input source program is analyzed for dividing the program into blocks (i.e., tasks) of various grain sizes such as subroutines or loops to analyze parallelism among the plurality of tasks, and the tasks and data to be accessed by the tasks are divided into sizes suited to a cache or local memory, to optimally allocate the tasks to the PUs to thereby generate an object program for efficiently operating the multiprocessor system. An architecture of a chip multiprocessor for supporting the multigrain parallel processing function is disclosed in JP 2001-175619 A.
In the multiprocessor system, a reduction of power consumption in each PU is essential to reduce power consumption and exhaust heat. Various proposals have been made regarding methods for reducing power of the individual processors. For example, a method for reducing power by dynamically controlling a frequency/voltage, i.e., reducing a system clock of a processor within real-time processing restrictions and supplying a voltage according to the system frequency to the processor is disclosed in JP 3138737 B and JP 2004-2341126 A.
In addition, according to a method disclosed in JP 2004-252900 A, a plurality of different kinds of processors such as CPU or digital signal processor are combined according to characteristics of each processing, whose processing time and power consumption on the processors are measured and provided as information beforehand, thereby dynamically allocating a series of processes to the processors based on the information.