1. Field of the Invention
The present invention relates to a multithread execution device that individually executes a plurality of programs and a method for executing multiple threads, and particularly, relates to a multithread execution device that adjusts the instruction issue rate for each program and a method for executing multiple threads.
2. Description of the Related Art
Various methods for efficiently executing a program are generally considered. Many of the approaches that improve execution efficiency attempt to shorten the time period in which a computing circuit of a CPU does not execute the program. For example, a technology to reduce branching or confirmation pending of the content of a process by a user operation (see Japanese Patent Application Publication No. 2007-328416 (JP-A-2007-328416) and Japanese Patent Application Publication No. 2000-47887 (JP-A-2000-47887), for example), a technology to reduce software overhead that is caused by an input/output (I/O) interrupt (see Japanese Patent Application Publication No. 6-35731 (JP-A-6-35731)), and a technology to reduce a hardware overhead such as context switching (Japanese Patent Application Publication No. 2004-234123 (JP-A-2004-234123) have been proposed.
JP-A-2007-328416 describes a heterogeneous multiprocessor that arranges a part of the program, which statically determines a processing order, in accordance with a characteristic of a processing unit (PU) upon compilation of the part of the program. In addition, JP-A-6-35731 describes a method for controlling an I/O subsystem call instruction in which hypervisor intervention is controlled by masking a subsystem call used to access shared resources. It is possible to suppress the overhead that is caused by the interrupt by controlling the hypervisor intervention. Furthermore, JP-A-2000-47887 describes a multithreaded processor that speculatively issues an instruction to a plurality of processors, and that includes a thread manager for controlling initiation of execution, termination, switching of the execution status, and data transfer between the threads in each processor. The multithreaded processor improves prediction accuracy of speculative execution by concurrently executing a plurality of program paths. Moreover, JP-A-2004-234123 describes a method for executing multiple threads that fixedly assigns a program counter and a register set to the thread and that switches between the standby thread and the executed thread, in a multithreaded processor that has a plurality of hardware threads. Because a time period that is required to prepare the thread is reduced, the processing speed is increased.
However, technologies as represented by those described in JP-A-2007-328416, JP-A-6-35731, JP-A-2000-47887, and JP-A-2004-234123 that simply improve the execution efficiency may affect a system that exhibits dependency on execution timing between two programs.
FIG. 1A shows a relationship between the execution timing of a program #0 and the execution timing of a program #1. The program #0 is executed by a CPU_A1, while the program #1 is executed by a CPU_B1. The CPU_A1 and the CPU_B1 are generally installed in different computers.
The program #0 executes a process “a”, while the program #1 executes a process “b1” and a process “b2”. The process “b2” is executed by using a processing result of the process “a”. In FIG. 1A, because the execution timing of the program #0 is synchronized with the execution timing of the program #1, the CPU_B1 can execute process “b2” by using the processing result of the process “a”.
FIG. 1B shows the execution timing of the program #0 and the execution timing of the program #1 when a CPU_B2, instead of the CPU_B1, executes the program #1. The CPU_B2 has higher instructions per clock cycle (IPC) or the smaller execution number of cycles per unit process than the CPU_B1, that is, the CPU_B2 has a fast execution speed. Because the CPU_B2 executes the program #1, the execution timing of the process “b1” is advanced. Meanwhile, when the process “b1” is completed, the CPU_A1 has not completed execution of the process “a”. Thus, when the CPU_B2 executes the process “b2”, the CPU_B2 cannot use the processing result of the process “a”. Consequently, the CPU_B2 may then execute the process “b2” using data that is obtained before the process “a” is completed.
In recent years, a CPU that includes hardware multithreading technology and a multi-core CPU in which a plurality of cores are installed in a single CPU have become available. Thus, it is considered to port the program #0 and the program #1 that have been respectively executed by the CPU_A1 and CPU_B1 to a single CPU to execute the programs. Such technology may be used when a plurality of electronic control units (ECUs) that are connected to an on-vehicle LAN are integrated into a fewer number of ECUs.
However, the CPU that is installed in the integrated ECU has different architecture from the CPU that is installed in the pre-integrated ECU. Thus, the same problem as that in FIG. 1B occurs if the program #0 and the program #1 that exhibit dependency to each other are simply ported to the integrated ECU.
On possible solution to this problem is to adjust the instruction issue rate when the nonconforming programs #0 and #1 are ported to an integrated CPU.
FIG. 2A shows a relationship between the CPU_A1 and the CPU_B1 before integration and the integrated CPU. An operation clock of the CPU_A1 is 60 MHz, while an operation clock of the CPU_B1 is 180 MHz. For sake of simplicity, the IPC of the CPU_A1 is set equal to the IPC of the CPU_B1 (IPC=1). However, the IPC of the CPU_A1 may differ from the IPC of the CPU_B1. An operation clock of the integrated CPU is 180 MHz. Thus, the execution speed of the CPU_B1 is three times faster than the execution speed of the CPU_A1. A vCPU_A1 and a vCPU_B1 that are included in the integrated CPU are virtual CPUs.
In order to correspond the execution timing of the program #0 that is executed by the CPU_A1 to the execution timing of the program #1 that is executed by the CPU_B1, the instruction issue rate has to be changed in the integrated CPU in accordance with the original execution speeds before integration.
FIG. 2B shows a relationship between the instruction issue rate and the execution numbers. If the execution speed of the CPU_B1 is three times faster than the execution speed of the CPU_A1, the integrated CPU issues three instructions to the vCPU_B1 while issuing one instruction to the vCPU_A1. Accordingly, the execution timing of the program #0 that is executed by the CPU_A1 before integration and the execution timing of the program #1 that is executed by the CPU_B1 before integration can correspond to each other to a certain degree in the integrated CPU.
However, not only operating frequencies, but also an instruction set and the number of instructions executed per unit time such as the IPC vary between the integrated CPU, and the CPU_A1 and the CPU_B1. In addition, when the integrated CPU includes a pipeline with multiple steps or includes a plurality of pipelines, the number of instructions executed per unit time varies due to a hazard or a stall. Thus, even when the instruction issue rate is determined in consideration of the IPC, a desired instruction issue rate cannot be obtained.
Therefore, with mere control of the instruction issue rate, it is impossible to correspond the number of instructions executed per unit time in the vCPU_A1 to the number of instructions executed per unit time in the CPU_A1, and it is also impossible to correspond the number of instructions executed per unit time in the vCPU_B1 with the number of instructions executed per unit time in the CPU_B1. For example, even when the instruction issue rate is controlled, the execution number of the program #0 per unit time and the execution number of the program #1 per unit time fluctuate repeatedly. Eventually, the execution timing of the program #0 significantly varies from the execution timing of the program #1.
More specifically, when the plurality of programs #0 and #1 are integrated by using the multithreading technology or the multi-core in the related art, the execution timings of the plurality of programs #0 and #1 before integration cannot be guaranteed after integration.