1. Field of the Invention
The present invention relates to a technique for speeding up execution of a program in a multiprocessor system. More particularly it relates to speeding up iteratively executed processing in a multiprocessor environment.
2. Description of Related Art
In recent years, so-called multiprocessor systems having multiple processors have been used in the fields of scientific and technological calculation, simulation and the like. In such a system, an application program generates multiple processes and assigns the processes to individual processors. These processors execute the processing while communicating with each other by using, for example, a shared memory space
A simulation field which has been particularly actively developed in recent years includes simulation software for plants for mechanical-electronic (“mechatronics”) products such as robots, automobiles and airplanes. Thanks to developments of electronic components and software technologies, most parts of a mechatronics product such as a robot, an automobile or an airplane are electrically controlled by using, wire connections provided in a spreading fashion like a network of nerves, a wireless LAN; and the like.
These mechatronics products are mechanical devices in nature, but also include a large amount of control software. Therefore, the development of such a mechatronics product now requires spending long time, huge costs and a large number of staff to develop a control program and to test the program.
One of the conventional techniques used for such testing is hardware in the loop simulation (HILS). An environment particularly for testing an electronic control unit (ECU) in an entire automobile is called a full-vehicle HILS. In the full-vehicle HILS, an actual ECU is connected to a dedicated hardware device for emulating an engine, a transmission mechanism and the like in a laboratory, and then is tested according to a predetermined scenario. An output from the ECU is inputted to a computer for monitoring, and is presented on a display. While looking at the display, an operator conducting the test checks if there is any abnormal operation.
The HILS, however, requires a lot of effort for preparation because the HILS has to use the dedicated hardware device and requires the device to be physically connected with the actual ECU by wiring. Moreover, when a test is conducted by replacing the ECU with another one, a lot of work is required due to the necessity to again physically connect the device and the ECU. Furthermore, since the test uses the actual ECU, it takes time to conduct the test. Therefore, it takes an enormous amount of time to test many scenarios. Moreover, the hardware device for emulation of the HILS is generally very expensive.
A technique has been recently proposed for testing by using software configuration without using an expensive hardware device for emulation. This technique is called software in the loop simulation (SILS), in which all components to be installed in the ECU, such as a microcomputer, an I/O circuit and control scenarios, as well as all components in a plant such as an engine and a transmission are configured by use of a software simulator. According to the SILS, a test is executable without the hardware of an ECU.
One of systems for assisting the formation of SILS is, for example, MATLAB®/Simulink® that is a simulation modeling system available from CYBERNET SYSTEMS CO., LTD. With use of MATLAB®/Simulink®, as shown in FIG. 1, functional blocks A, B, . . . , M are arranged on a display screen by a graphical interface, and a simulation program can be created by specifying a process flow among the functional blocks as arrows indicate.
After the block diagram including the functional block A, B, . . . , M is created on MATLAB®/Simulink®, Real-Time Workshop® performs conversion of the simulation program to generate source codes in the C language so that the source codes can have functions equivalent to those presented in the block diagram. The source code in the C language is compiled into a program based on which a simulation is executable as SILS on another computer system.
With a multiprocessor system, in particular, it is advantageous, for the speed-up of processing, to divide the entire processing into as many processes as possible, and to cause the multiprocessor system to perform parallel processing by assigning different processes to the individual processors.
In a conventionally-used technique to achieve this, the functional blocks A, B, . . . , M are divided into multiple clusters 202, 204, 206, 208 and 210 as shown in FIG. 2, and each of the clusters is assigned to one of the CPUs. To implement such clustering, used is, for example, a technique of detecting strongly connected components, which is one of known compiler techniques. A main purpose of clustering is cost-cutting for communication between functional blocks within the same cluster.
However, the functional blocks have dependencies between them as shown in FIG. 2. Since the processing should not be parallelized irrespective of these dependencies, the paralleling of the processing must be accomplished under certain constraints.
Japanese Patent Application Publication No. Hei 9-97243 aims to shorten turnaround time of a program constituted of parallel tasks and executed in a multiprocessor system. In the disclosed system, a compiler generates an object program by compiling a source program of the program constituted by the parallel tasks, and generates an inter-task communication amount table for holding an amount of data exchanged between each task and other corresponding tasks of the parallel tasks. A task scheduler determines which one of the processors to be allocated to each of the parallel tasks based on the inter-task communication amount table and a processor communication cost table for defining data communication time per unit data between each pair of all the processors in the multiprocessor system. This determination is made so as to minimize communication time of the inter-task communications. Then, the task scheduler registers the determination result in a processor management table.
Japanese Patent Application Publication No. Hei 9-167144 discloses a program generation method for changing a parallel program for executing parallel processing. This parallel program includes descriptions of multiple kinds of calculation procedures and multiple kinds of communication procedures to be applied to communication processing among processors. In this method, when the time from the start to the completion of the parallel processing is shortened if a communication amount is increased in the communication processing executed in accordance with the communication procedures currently in use, the communication procedures are rearranged within the parallel program, and thereby the descriptions of the parallel program are modified so that two or more communication procedures will be merged together.
Japanese Patent Application Publication No. 2007-048052 relates to a compiler that optimizes parallel processing. The compiler records the number of execution cores that is the total number of processor cores to execute an object program. The compiler firstly detects dominant paths from the object program. The dominant paths are candidates for an execution path to be successively executed by a single processor core. Then, the compiler selects dominant paths as many as or less than the number of execution cores, and thereby generates clusters of tasks to be executed in parallel or successively by the multi-core processors. Thereafter, the compiler calculates an execution time of each of the generated clusters in the case where the cluster is executed by as many processor cores as each of natural numbers from one to the number of execution cores. Then, based on the execution time thus calculated, the compiler selects how many processor cores to be assigned for execution of each of the clusters.
However, these conventional techniques still do not provide any solution to constraints that the dependencies among functional blocks impose on the paralleling of processing. The present inventors consider that a part of processing to be iteratively executed is a bottle neck of the processing, and that the paralleling of the part largely contributes to a speed-up of the processing.