This invention relates to a compilation method and a compiler for generating an execution program which enables efficient operations of a plurality of processor units in a multiprocessor system constituted of a plurality of heterogeneous processor units.
Device miniaturization by the advancement of a semiconductor manufacturing technology now enables integration of the huge number of transistors. Simultaneously processor operation frequencies have become higher and higher. However, with an increase of operation power and an increase of standby power caused by a leakage current, the limit has begun to be seen in performance improvement which has been achieved by an increase in operation frequency and improvement of a logical system in the conventional processor. Meanwhile, digital consumer devices including an automobile navigation system, a mobile phone, and a digital television, which simultaneously process a variety of data such as images, voices, and database information, have emerged, and a huge volume of data having different characteristics must be processed with lower power within a short time.
Thus, at present, as means for realizing performance improvement and lower power, a multiprocessor system is promising in that high calculation performance can be obtained by integrating a plurality of general-purpose processors (CPU) conventionally provided in a single chip on a single chip and executing processing in parallel without increasing an operation frequency. In the future, it is expected that further advancement of miniaturization will enable mounting of 100 to 1000 PU's on a chip. Especially, in a system for built-in devices, routine processing of digital signals such as wireless, images, and voices is frequently performed and, as means for achieving both performance improvement and lower power, in addition to a homogeneous multiprocessor system which integrates a plurality of identical general-purpose processors (CPU), i.e., general-purpose processors identical in configuration and calculation performance using identical instruction sets, a heterogeneous multiprocessor (HCMP) system which includes various types of PU's of different instruction sets such as dedicated processors or accelerators capable of executing specific processing highly efficiently (at high speed with lower power) on a single chip in addition to a plurality of CPU's, targets specific applications, and aims at high calculation efficiency has been proposed. An example of a dedicated processor is a dynamically configurable processor (DRP) disclosed in Tsunoda et al., “Outline of Digital Media Reconfiguration Type Processor FE-GA”, Technical Report of the Institute of Electronics, Information and Communication Engineers, RECONF-65 (Non-patent Document 1).
In such a multiprocessor system, to obtain calculation performance proportional to the number of PU's, the mounted PU's must be simultaneously operated to process a program. However, as normal input programs are time-sequentially written for processing, it is difficult to obtain expected calculation performance proportional to the number of PUs mounted. As a method for solving the problem, a programmer himself must take parallelism of programs into account, and rewrites original programs by adding parallelization codes to execute the programs by the plurality of PU's based on a configuration of the multiprocessor system for executing the programs. However, while it is effective in the system which includes the plurality of PU's, this method is impractical in development time and effective performance, in a future system which includes several tens to thousands of PU's mounted therein especially a system employing a case of an HCMP constituted of different type of PU's.
(Known Example: Multigrain Parallelization Compiler)
Thus, in the homogeneous multiprocessor system, there has been proposed an automatic parallelization compiler which automatically extracts parallelism of programs, and distributes processing to a plurality of PU's to improve processing performance. That is, studies have been conducted on an automatic parallelization compiler which analyzes input programs, extracts portions operable in parallel from the programs, and allocates the portions to the plurality of PU's to enable simultaneous execution thereof. For example, JP 2004-252728 A discloses a compilation system which analyzes an input source program, divides the program into blocks (tasks) of various particle sizes such as subroutines or loops, analyzes parallelism among the plurality of tasks, divides the tasks and data to be accessed by the tasks into sizes suited to a cache or a local memory, and generates an object program for efficiently operating the multiprocessor system by optimally allocating the tasks to the PU's. Additionally, JP 2001-175619 A discloses an architecture of chip multiprocessor which supports a function of multigrain parallel processing.
(Known Example: Asymmetrical Multiprocessor, Power Scheduling)
Regarding the heterogeneous multiprocessor (HCMP), JP 2004-252900 A discloses, as task allocation means for educing performance of a group of processors different from one another in configuration, a method for dynamically allocating a series of processing to the processors based on information by combining a plurality of processors of different types such as CPU's or DSP's according to processing characteristics in an application in which a processing procedure such as image processing has been determined, and measuring and supplying processing time or power consumption information in the processors beforehand.
(Known Example: Automatic Vector Compiler)
In the HCMP, dedicated execution codes must be generated before types of processors are determined. As generation means, Tanaka and Iwasawa “Compilation Technique for Vector Computer”, Information Processing, Vol. 31, 6th edition, Jun. 5, 1990, (Non-patent Document 2) and Kuck, D. J., et al., “Dependence Graphs and Compiler Optimization”, Proc. 8th Annual ACM Symposium on Principles of Programming Languages, pp. 177-189 (1981) (Non-patent Document 3) disclose an automatic vector compiler which extracts a vector-calculable portion (loop) from a program through data dependence analysis to generate a vector calculation instruction, for example, when a dedicated processor is a vector calculator.