(1) Field of the Invention
The present invention relates to processors, and more particularly to a technology for executing programs efficiently in a multiprocessor system.
(2) Description of the Related Art
In a parallel processing system, in which a single processor executes various programs in parallel, the processor switches tasks according to a predetermined trigger. A task is a unit to be carried out by the processor for the program execution. In this manner, multiple programs are executed seemingly in parallel by a single processor.
Since each of the tasks is executed by the processor as if the processor were used for the task exclusively, tasks seem to be allocated to respective virtual processors to be executed.
The virtual processor does not need to have all the functions of a real processor, but needs to have only information necessary to execute tasks. Examples of such information are data information and control information regarding a program counter, a flag register, a stack area, a general-purpose register, and the like. Such information necessary to execute a task is called a “context” or “context data”.
When a task that is currently executed is to be switched to another task, contexts must be switched. In general, contexts are stored in a memory. Therefore, switching contexts means writing a context of a current task into the memory (hereinafter, referred to as “saving”), and then reading another context of a next task to be executed from the memory (hereinafter, referred to as “retrieving”).
Conventionally, the context switching has been implemented not only in operating systems (OS), but also in hardware as disclosed in Japanese Patent Application Laid-Open No. 2003-271399. The hardware which performs the context switching is called a context switching device.
FIG. 1 is a functional block diagram showing an overall structure of the conventional context switching device.
As shown in FIG. 1, the conventional context switching device includes: a context switching device 1001, a processor 1002, and a memory 1003. The memory 1003 holds contexts. Note that FIG. 1 shows only functional blocks necessary to explain the context switching processing concisely.
The processor 1002 has a processor unit 1004, a set of registers A 1005, another set of registers B 1006, and a context selection unit 1007. Contexts are stored into the sets of registers.
The processor 1002 switches tasks in the following processing. Here, it is assumed that a currently executed task is a task 1, and the task 1 is to be switched to a task 2. It is also assumed that a context of the currently executed task 1 is now stored in the register set A 1005.
Firstly, a context of the next task 2 to be executed is retrieved from the memory 1003 to the register set B 1006.
Next, when the current task 1 is to be switched to the task 2, the processor unit 1004 accesses the register set B 1006, and the context switching device 1001 accesses the register set A 1005, under the control of the context selection unit 1007.
Then, in starting execution of the task 2, the context of task 1 in the register set A 1005 is saved into the memory 1003.
FIGS. 2A and 2B are diagrams explaining this context switching processing. Firstly, a premise of the processing is described with reference to FIG. 2A.
Here, for example, two programs of a program (1) and a program (2) are to be executed by the processor 1002. The processing is performed as if the programs were allocated to two independent virtual processors, respectively. Such a virtual processor is called a logical processor (hereinafter, referred to as a LP). Then, by switching tasks between the programs (1) and (2), the processor 1002 executes these programs. Hereinafter, execution of program (1) by the processor 1002 is expressed, for example, as “execution of LP(1) by the processor 1002”.
FIG. 2B is a diagram showing a relationship between: a in request for retrieving of a context of LP(2); and a time period for executing LP(1) and LP(2) by the processor 1002. Hereinafter, the time period of executing a LP is referred to as a “execution time”.
The processor 1002 starts execution of LP(1) at time T0, and at time T3, switches LP(1) to LP(2) to be executed.
Prior to starting the execution of LP(2) at time T3, retrieving a context of LP(2) starts at time T1. Note that a trigger for retrieving a context is referred to as a “context retrieving request”, A data transfer control unit 1009 in the context switching device 1001 starts retrieving the context when a context retrieving request occurs, and completes the retrieving at time T2.
Note that a trigger for saving a context is referred to as a “context saving request”. Hereinafter, both of the “context retrieving request” and the “context saving request” are referred to as a “context switching request”.
The context switching requests are generated inside the context switching device 1001. Although various techniques can be conceived to generate the requests, in this example, a context switching request is generated when a certain amount of time has passed after the start of an execution time of each LP. The execution time of each LP is counted by the counter 1500 of FIG. 1.
Furthermore, although various techniques can be conceived to decide a timing of context switching, in this example, a trigger far context switching is generated inside the context switching device 1001, when an execution time of a LP counted by the counter 1500 reaches a certain time period.
According to this conventional example, it is sure that overhead due to the context switching is not generated when a single processor is used, and thereby the single processor can switch tasks efficiently.
In the meantime, in order to speed up the processing, increasing the number of processors while reducing the number of shared memories is recently desirable.
However, in this case, the following problem arises. As described above, the processor 1002 executes multiple tasks seemingly in parallel. When there are a plurality of such processors operated in parallel, and all contexts of tasks to be executed by the processors are stored in the shared memory, there are often conflicts among accesses from the plural processors to the shared memory, in other words, retrieving and saving for multiple contexts. This results in a problem of generating latency from completion of execution of a current task to start of execution of a next task.
This problem is discussed in more detail by referring again to FIG. 2B. Originally, retrieving of the context of LP(2) is completed by time T2, and LP(1) is switched to LP(2) at time T3. However, due to the conflict among plural accesses to the shared memory, the context retrieving is often delayed and completed after time T3. In such a situation, execution of LP(2) cannot start at time T3, but should be suspended until the context retrieving is completed. As a result, the task switching requires an extra time spent waiting.