1. Field of the Invention
The present invention relates to a program development and a multi-processor configuration technique to provide a parallel processing system, and, more specifically, to a technique for a compiler and an execution system to provide the parallel processing system.
2. Description of the Related Art
There are two major program development techniques for providing a parallel processing system. One is a technique for providing a development environment based on an automatic parallel compiler of a sequential program (an automatic parallel compiler technique). The other is a technique for providing a development environment based on a parallel processing language, which is extended from a sequential processing language (a parallel processing language technique).
One automatic parallel compiler technique is an automatic parallel compiler technique for a multiprocessor (see non-patent documents 1, 2 and 3). The automatic parallel compiler technique is a technique of automatically generating a parallel processing program from a sequential program described in a high-level programming language (typically, in Fortran or C language). More specifically, a sequential program is parallelized through loop parallelization, in which a loop (a sequence of portions to be processed repeatedly) is divided, and the divided portions of the loop are parallelly executed by different processors, or through block parallelization, in which parallelly-executable portions are parallelly executed by different processors.
Another automatic parallel compiler technique is an instruction level parallel compiler technique (see, for example, non-patent documents 4 and 5). In the instruction level parallel compiler technique, an execution code for a VLIW (Very Long Instruction Word) processor (which has a plurality of computing units) is automatically generated from a sequential program described in a high-level programming language (typically, in C language) or a similar language. The execution code, also called a horizontal instruction code, has a single execution instruction, in which execution instructions for all of the computing units are buried. The VLIW is a technique of speeding up operations of a microprocessor, in which a plurality of instructions not dependent on one another are made into a single instruction, and are executed practically simultaneously, when the single instruction is executed.
One parallel processing language technique is a parallel programming language (see, for example, non-patent documents 6, 7 and 8). The parallel programming language is a language for directly describing a parallel processing program for a multiprocessor. The parallel programming language is based on a high-level programming language, and is extended therefrom for explicitly describing a parallel execution loop or a parallel execution block. A large number of parallel programming languages have been proposed so far. Non-patent documents 6, 7 and 8 explain VPP Fortran, HPF (High Performance Fortran), and Concurrent C, respectively.
Another parallel processing language technique is a message passing programming technique (see, for example, non-patent documents 9 and 10). In the message passing programming technique, a parallel programming environment (such as MPI: Message Passing Interface□ and PVM□Parallel Virtual Machine□, in which message passing functions between processors are created into a library is provided to a high-level programming language (typically, Fortran or C language). In the message passing programming technique, a program is parallelly executed in a plurality of PCs (Personal Computers) and work stations connected via a network. The message passing programming technique is also used for developing a parallel execution program for a distributed memory multiprocessor system or a shared memory multiprocessor system. Non-patent documents 9 and 10 explain MPI (Message Passing Interface) and PVM (Parallel Virtual Machine), respectively.
There are three major types of the multi-processor configuration technique for providing a parallel processing system; a Neumann-type program-driven control method, a data flow machine-type data-driven control method, and a hybrid data flow machine-type control method (a fusion architecture). The latter is a technique integrating the two formers.
The Neumann-type program-driven control method sequentially reads out a program stored in a memory using a program counter, and executes the program (see, for example, non-patent documents 11 and 12). Multi-processor systems which have already been practically used are typically equipped with Neumann-type processors. The Neumann-type program-driven control method has a program in which a data transfer instruction, a data reception instruction, a synchronization processing instruction, and the like, which are required by the Neumann-type processors, are buried. The Neumann-type program-driven control method executes instructions read out by the processors sequentially.
The data flow machine-type data-driven control method executes instructions sequentially, starting from an instruction of which reference data (input data) has been completely generated and become executable (see, for example, non-patent documents 13 to 16).
The hybrid data flow machine-type control method has a data flow processing unit of an instruction block, which is an aggregate of plural instructions. The control method controls a synchronization instruction between instruction blocks with a data drive, and controls a processing in an instruction block with a program drive. More specifically, the control method executes execution programs and data transfer instructions of each processor with a program-driven control. The control method executes a data reception instruction and a synchronization instruction not with a program-driven control but with a mechanism which secures a dependency between a data communication between processors, and an instruction which references a data in the data communication (that is, a mechanism which suspends execution of an instruction until the instruction which references an external data actually receives the external data) (see, for example, non-patent document 17). For example, a hybrid data flow machine is proposed in Category of Table 1 on page 30 of non-patent document 17, in Macro-dataflow and Hybrid.    Non-patent document 1: Okamoto, Aida, Miyazawa, Honda, Kasahara, “Hierarchical Macro Dataflow Processing in OSCAR Multigrain Compiler”, Journal of Information Processing Society of Japan, Vol. 35, No. 4, pp. 513-521 (1994)    Non-patent document 2: Eigenmann, Hoeflinger, Padua, “On the Automatic Parallelization of the Perfect Benchmarks”, IEEE Trans. on Parallel and Distributed Systems, Vol. 9, No. 1, pp. 5-21 (1998)    Non-patent document 3: Hall, Anderson, Amarasinghe, Murphy, Liao, Bugnion, Lam, “Maximizing Multiprocessor Performance with the SUIF Compiler”, IEEE Computer, Vol. 29, No. 12, pp. 84-89 (1996)    Non-patent document 4: Fisher, “Trace scheduling: A Technique for global Microcode Compaction”, IEEE Trans. Computers, Vol. 30, No. 7, pp. 478-490 (1981)    Non-patent document 5: Wakabayashi, Tanaka, “Global Scheduling Independent of Control Dependencies Based on Condition Vectors”, Proceedings of 29th ACM/IEEE Conference on Design Automation, pp. 112-115 (1992)    Non-patent document 6: Iwashita, “VPP Fortran from Viewpoint of HPF”, Information Processing, Vol. 38, No. 2, pp. 114-121 (February 1997)    Non-patent document 7: “HPF Promotion Council (HPFPC)”, [online], [searched on Aug. 10, 2005], Internet <URL:http://www.hpfpc.org/>    Non-patent document 8: Gehani, et al, “Concurrent C”, Software, Practice and Experience, Vol. 16, No. 9, pp. 821-844 (1986)    Non-patent document 9: “Message Passing Interface Forum”, [online], [searched on Aug. 10, 2005], Internet <URL:http://www.mpi-forum.org/index.html>    Non-patent document 10: “PVM”, [online], [searched on Aug. 10, 2005], Internet <URL:http://www.csm.ornl.gov/pvm/pvm_home.html>    Non-patent document 11: Hennessy, Patterson, “Computer Architecture: A Quantitative Approach”, Morgan Kaufman, San Mateo (1990)    Non-patent document 12: Kai Hwang, “Advanced Computer Architecture with Parallel Programming”, McGraw-Hill (1993)    Non-patent document 13: Arvind, Iannucci, “A Critique of multiprocessing von Neumann style”, Proceedings of 10th Annual Symposium on Computer Architecture (1983)    Non-patent document 14: Srini, “An Architectural Comparison of Dataflow Systems”, IEEE Computer, Vol 19, No. 3, pp. 68-88 (1986)    Non-patent document 15: Arvind, Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture”, IEEE Trans. Computer, Vol. 39, pp. 300-318 (1990)    Non-patent document 16: Kodama, Sakai, Yamaguchi, “Principle of Operation and Implementation of Data-driven Single Chip Processor EMC-R”, Journal of Information Processing Society of Japan, Vol. 32, No. 7 (1991)    Non-patent document 17: Ben Lee, Ali R. Hurson: “Dataflow Architectures and Multithreading”, IEEE Computer, Volume 27, Number 8, pp. 27-39 (1994□
The above-described automatic parallel compiler technique can automatically generate a parallel processing program from a sequential program. However, the automatic parallel compiler technique is not suitable for flexibly changing its program partitioning method or its allocating method of partitioned programs to processors. This means that a programmer cannot directly deal with program partition or processor allocation. Another problem is that, a field of application of the automatic parallel compiler technique is still limited, although the automatic parallel compiler technique has been increasingly applied to CMP (Chip Multi Processor) for use as a server, especially to scientific computing in which computing load is heavy. For example, the automatic parallel compiler technique is not applicable to a system LSI (Large Scale Integration). The CMP is a technique of integrating plural processors into a single chip, and connecting the processors with a shared bus.
The instruction-level parallel compiler is applied to designing of a dedicated IC (Integrated Circuit). The instruction-level parallel compiler is practical but provides a relatively-low parallelism. That is, the instruction-level parallel compiler is not applicable to a parallel processing system having a relatively-high parallelism.
The parallel processing language is intended to be used in a specific field such as science and technology. In other words, the parallel processing language is used in a limited field, and is not applicable to a wide range of fields.
The message passing programming technique is used in upstream designing of the system LSI. However, the message passing programming technique has poor efficiency in developing a program, because a program for each processor has to be developed, and a communication instruction has to be explicitly described in the message passing programming technique. That is, debugging of a program is difficult, and tuning such as changing assignment of processors is also difficult.
In light of the above-mentioned, the inventors have developed a technique of generating an execution code with which a multi-processor system can execute a high-performance parallel processing, just by adding a simple description to a sequential program by a programmer or the like. When the execution code is actually applied to a multi-processor system, it is required to develop a means of generating an execution code with which the multi-processor system appropriately performs communication processings of a pointer and a pointer reference data.
The hybrid data flow machine-type control method ensures a data dependency (to observe an execution order of an instruction of generating a data and that of referencing the data) by a program control. That is, a data transfer instruction, a data reception instruction, a synchronization processing instruction are executed by software, and the instructions need to be executed one after the other by each processor. Thus, a time necessary for such processings results in a major bottleneck in performing a parallel processing.
The distributed memory multiprocessor system can achieve a higher-level parallelism than the shared memory multiprocessor, because the distributed memory multiprocessor system is free of an access conflict to a shared memory. However, in the distributed memory multiprocessor system, a pointer data and a pointer reference data are not shared between processors, because memory spaces for the processors are separated from each other. This is a large restriction in developing a parallel processing program.
The data flow machine-type data-driven control method sequentially executes instructions, starting from an instruction which has become executable, and can provide a high-level parallelism in theory. However, the data flow machine-type data-driven control method requires a mechanism of controlling an instruction which has become executable, and a mechanism of assigning the executable instruction to one of plural processors. This makes a hardware configuration for the method complicated. Moreover, the data flow machine-type data-driven control method requires a programming language dedicated to a data flow machine and a compiler dedicated to the programming language, so as to create an execution program. This is a large restriction in developing a program, and makes it difficult for the data flow machine-type data-driven control method to be put into practical use.
The hybrid data flow machine-type control method controls a synchronization instruction between instruction blocks with a data drive, and thus requires a smaller amount of time for a parallel processing than the Neumann-type program-driven control method. Further, the hybrid data flow machine-type control method controls a processing in an instruction block with a program drive, and thus requires a smaller amount of overhead of a processing for controlling an executable instruction than the data flow machine-type data-driven control method. However, in the hybrid data flow machine-type control method, a start-up of an instruction block to be executed by controlling with a program drive is performed with a data-driven control method (in which an instruction block to be started up is specified in a communicated data). This makes a complicated hardware mechanism of a data flow machine indispensable. Further, in the hybrid data flow machine-type control method, an external data to be referenced in an instruction block needs to be ready, before the instruction block is started up. Thus, a waiting time is needed to start up the instruction block, which causes a delay time in communications between processors.
The present invention has been made in light of the above-mentioned problems, and is directed to generating an execution code with which a multi-processor system can execute a high-performance parallel processing, just by adding a simple description to a sequential program by a programmer or the like, and also to generating an execution code which can be applied to perform a communication processing of a pointer and that of a pointer reference data. The present invention is also directed to, when a parallel processing is performed using a multi-processor, eliminating a need of a complicated hardware configuration, reducing a delay time to be generated in communications between processors, and facilitating development of a high-performance multi-processor system. The present invention is also directed to enabling sharing of a pointer data and a pointer reference data between processors which have different memory spaces, and providing a large flexibility in developing a parallel processing program.