Embedded systems have become more sophisticated, and concurrent execution of a plurality of applications and execution of high load applications are required. Among the embedded systems, mobile devices such as cellular phones and PND (Portable Navigation Device), particularly, have become more and more sophisticated, and concurrent execution of a plurality of applications such as the playback of video or music, not only a single application such as the original communication function or navigation function, is required. Multimedia processing such as video playback imposes a high computational load, and a high-performance processor is needed.
As a high-performance processor that executes such high load, plural applications, a multi-core processor that integrates a plurality of computational cores is coming into usage. Because the multi-core processor can achieve high performance with low power consumption, it is an essential technology for mobile devices for which battery run time and heat are of issue.
In order to efficiently execute high load, plural applications using the multi-core processor, parallelization of the applications and allocation of computational resources among the applications are necessary. High load applications which cannot be processed by one core need to be processed by a plurality of cores by way of parallelism. Further, in order to effectively use the plurality of cores, optimization as to what amount of computational resources of which core are to be allocated to each application is important. Because the number of running applications and the load on applications vary, the computational resource allocation needs to be performed dynamically during execution.
The parallelization is performed typically using the parallelism library. Some of the parallelism library use threads such as POSIX or Windows (registered trademark), and others are OpenMP, Intel Threading Building Blocks (TBB) and the like. In the thread library such as POSIX or Windows (registered trademark), the division of application processing and the allocation of divided applications to cores are written by a programmer. The allocation is done manually by a programmer, being aware of the number of cores for allocation. On the other hand, in OpenMP or TBB, the division of processing is done by a programmer, whereas the allocation to cores is executed automatically by the library. Therefore, a programmer is not necessarily particularly aware of the number of cores.
In the parallelism library TBB or the like, task parallelism that divides application processing into a plurality of tasks and automatically allocates the divided tasks to cores is done is performed. The task parallelism model includes a task pool that stores executable tasks and a scheduler that allocates tasks to cores. A programmer writes a program that divides application processing into a plurality of tasks which can be executed in parallel and inserts the executable tasks into the task pool. The tasks inserted into the task pool are automatically allocated to cores by the scheduler. When the number of tasks stored in the task pool is greater than the number of cores, load distribution can be achieved easily, which is a feature of the task parallelism.
The task parallelism allows the number of cores to which the allocation is made (the degree of parallelism) to be changed during execution and thereby facilitates the dynamic allocation of computational resources. Because the scheduler of the task parallelism dynamically allocates tasks to cores capable of dynamically processing the tasks in the task pool, the parallelization without depending on the number of cores can be done. Therefore, the degree of parallelism can be easily changed during execution, and the allocation of computational resources can be dynamically changed according to load variation in an application of interest or another application.
The present invention assumes the use of a parallel model in which each core has a task pool (FIG. 10). In this model, a scheduler 121 performs an operation to acquire a task from a task pool and allocate it to a core and an operation to insert a generated task into a task pool. Hereinafter, the two operations, i.e. the allocation of a task to a core and the insertion of a task into a task pool, are referred to as the task allocation. An example of the task allocation operation to a computational core 131 in a task allocation device 100 is described based on FIGS. 11 and 12.
First, the operation to acquire a task from a task pool and allocate it to a core is described based on FIG. 11.
A scheduler 121 checks whether there is a task in a task pool 111 (Step 201).
When there is a task in the task pool 111, the scheduler 121 acquires the task from the task pool 111. For example, the first-inserted task may be acquired first (Step 202).
When there is no task in the task pool 111, the scheduler 121 checks whether there is a task in another task pool 112, . . . , and, when there is no task in any task pool, the task allocation ends (Step 203).
When there is a task in another task pool, the scheduler 121 acquires the task from that task pool (Step 204).
The scheduler 121 allocates the acquired task to the computational core 131, and then the process ends (Step 205).
Next, the operation to insert a task into a task pool is described based on FIG. 12. Generation of a task is performed in a running task, and the scheduler 121 is called after the task generation.
The scheduler 121 inserts the new task into the task pool 111 (Step 211). Some multi-core processor has a heterostructure in which cores do not have the processing performance of the same quality and have different performance. The heterostructure viewed from a parallel program includes a structure in which the physical performance of each core is different (Asymmetric Multiple Processor :AMP) (FIG. 13A) and a structure in which the core performance is of the same quality (Symmetric Multiple Processor :SMP), whereas the computational performance allocated to the parallel program differs from core to core (FIG. 13B). Regarding the core 2 in FIG. 13B, the computational performance of 50% may be explicitly allocated to the program, or the computational performance of 50% may be allocated in consequence of the influence of another program operating on the core.
In the multi-core processor having the heterostructure, it is necessary to take the performance of cores and the dependency between tasks into account in the allocation of tasks to cores. This is because, when there is a dependency to refer to a processing result between tasks and if a task which is referred to by many tasks is allocated to a low-performance core, there is a possibility that a high-performance core waits for the end of processing of the task, resulting in a decrease in parallel performance. The number of references made by other tasks is called the reference count, which serves as an index of the dependency between tasks.
Further, in the case where the reference count is decided during execution, the task allocation needs to be performed during execution. In some cases, there is a complex dependence between tasks, such as when whether to generate a task or whether to refer to a processing result of a certain task is decided by condition determination during execution. When the dependency is determined only during execution like this case, the reference count of a task is determined at the point when all tasks which are likely to refer to a processing result of the task are generated.
Patent Document 1 discloses a technique to allocate tasks to cores in a multi-core processor including a plurality of cores with different performance. The technique employs a task-parallel model that includes a plurality of task pools, and the allocation of tasks can be done during execution. A dependency between tasks is set in advance, and the computational load of task processing and the communication cost between tasks are calculated during execution, thereby allocating tasks.
Further, in respect to a heterogeneous multi-processor system, a technique to achieve control according to conditions such as when processing time of a macro task varies during execution by performing task allocation during execution has been proposed as a related art (for example, refer to Patent Document 2).
Further, although a technique to perform scheduling based on the reference count is proposed as a related art, the object of using the reference count is to increase the number of executable tasks by executing a task with a high reference count in advance (for example, refer to Patent Document 3).