1. Field of the Invention
The present invention relates generally to dynamic resource allocation of processing units and, more particularly, to a method and apparatus that perform dynamic resource allocation by assigning processing units to task blocks so that both Central Processing Unit (CPU) resources and Graphics Processing Unit (GPU) resources are efficiently utilized.
2. Description of the Related Art
Processing units, such as CPUs, GPUs and Coarse-Grained Reconfigurable Architectures (CGRA), may be realized using computational software. A CPU, GPU and CGRA are described in greater detail below with reference to FIGS. 1 and 2.
As illustrated in FIG. 1, a CPU 110 may include at least one core 115 that performs actual computation, and a 120 GPU may include at least one Processing Element (PE) 125 that performs actual computation. Further, in FIG. 1, both the CPU 110 and the GPU 120 are illustrated in connection with a memory 130.
Recently, processors having multiple cores or GPUs having multiple PEs have been widely employed. In particular, high-end processing units, such as GPUs, may include dozens to hundreds of PEs.
FIG. 2 illustrates a configuration of a coarse-grained reconfigurable architecture (CGRA). In a CGRA, many processing elements (function units, FUs) are arranged so that inputs and outputs are transferred therebetween for organized processing. Paths of data to be computed and FUs to process the data are adjusted in a dynamic manner. In the CGRA, as the name implies, the method and sequence to use arranged hardware components may be adjusted by software means in a relatively coarse-grained way not in a fine-grained way. One PE may rapidly perform a small operation and a large number of interconnected PEs may perform a large and complex operation. In FIG. 2, a number of PEs are interconnected by a mesh style network. Operands are computed through adjustable paths of PEs. One PE may receive inputs from neighbor PEs and produce outputs to neighbor PEs, and may have a register file to hold temporary values, a configuration memory providing reconfiguration information and an FU (ALU) to compute an operation.
As the number of processing units increases, it is crucial to efficiently manage resources, including these processing units, to enhance overall system performance.
In general, GPU processing is initiated when the CPU invokes the GPU. In order for the GPU to execute a task, the CPU may set a GPU register. Hardware threads have been utilized to execute multiple operations in parallel in the GPU. Using these hardware threads, processing elements may be grouped, and groups of processing elements may execute in parallel. As described above, systems generally evolve toward maximizing parallel operation execution.
However, existing techniques tend to sequentially utilize the CPU and the GPU. FIG. 3 is a diagram illustrating sequential processing of tasks using CPU and GPU resources. More specifically, FIG. 3 illustrates CPU 310 and GPU 320 utilization over time.
Referring to FIG. 3, the CPU 310, which is processing a task, invokes the GPU 320 at a particular point in time. While the GPU processes the task, the CPU waits for completion of the task at the GPU. However, the CPU may also process a different task or program. During GPU execution, not all processing elements may be utilized. Specifically, as shown in FIG. 3, only active processing elements 330 are used and the remaining processing elements 340 remain in idle state. When the processing of the task is ended at the GPU, the GPU returns the processing results to the CPU, which then continues subsequent processing.
As described above, in sequential resource utilization, the CPU and the GPU are not utilized simultaneously and not all processing elements of the GPU are used. When a task is not partitioned in the GPU, the whole GPU may be occupied by a single task. Since most application programs do not utilize all the processing elements of the GPU, GPU resources may be wasted, degrading system performance.