1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to performing tasks, and more particularly, to performing related tasks in an asymmetric multi-core processor.
2. Description of the Related Art
Presently, demands for multimedia have drastically increased. Accordingly, it has become increasingly important to process a multimedia codec in addition to a video codec in real time. However, since many operations are needed to process the multimedia codec, it is necessary to improve the performance of a processor.
In the past, it was important to improve the performance of a processor by increasing the clock speed of a central processing unit (CPU). However, such increase in the clock speed of the CPU is limited. Accordingly, a next generation processor model is required. As such, one of the next generation processors has a multi-core structure in which a plurality of processor cores are embedded into a single chip so as to improve the performance of the processor.
Specifically, an asymmetric multi-core structure indicates a processor constructed with general purpose processor (GPP) cores for performing general processing operations and digital signal processor (DSP) cores for performing operations. Since the part corresponding to the DSP cores has a simpler structure than the GPP cores in terms of performance, degree of integration, and power, it is possible for a single chip to include a plurality of DSP cores.
Accordingly, a structure that is constructed with a GPP core and a plurality of DSP cores is generally preferred. Presently, each DSP core includes a separate local memory that needs to communicate with a main memory in order to share data, and the communication between the local memory and the main memory is performed by a direct memory access (DMA).
In general, the tasks that are performed by a DSP may be complicated operations rather than control operations. For example, the tasks may be codecs for multimedia processing or three-dimensional stereo-scopic image processing. In this case, the tasks may be performed, not in a single DSP core, however in a plurality of DSP cores due to memory requirements or operation requirements.
For example, in the case of an H.264 decoder, two DSPs are used to process a frame. Upper and lower parts of a screen are compressed by different DSPs. In this case, tasks that are to be performed by DSPs may have to closely communicate with one another. Accordingly, it is effective that the tasks be concurrently performed.
FIG. 1 illustrates an example in which related tasks arbitrarily start to be performed in a distributed system having a plurality of processing nodes according to a related art technique.
In FIG. 1, the number of related tasks is eight, and eight DSPs 111 to 118 perform the related tasks. The eight DSPs 111 to 118 begin the related tasks at any time, process the same amount of tasks, and perform the next operations of the tasks through synchronization.
In this case, an interval 140 is an instance during which the DSPs 111 to 118 stop their operations, thereby indicating that an overhead has occurred. For example, since the DSP 111, shown at the top of FIG. 1, begins tasks earlier than the other DSPs 112 to 118, the termination time of the tasks of the DSP 111 is earlier than those of the other DSPs 112 to 118. In order to begin the task of the DSP 111 again at a first time point 132, so as to synchronize the task of the DSP 111 with the tasks of the other DSPs 112 to 118, the task of the DSP 111 has to be delayed until the DSPs 112 to 118 finish their tasks. Accordingly, a section 141 occurs during which an operation is stopped so that the related tasks are all synchronized with each other.
Accordingly, in the case of a symmetric multi-processor system such as a general cluster or distributed system, when related tasks are not synchronized with one another, an overhead occurs. However, in such a symmetric distributed system, a plurality of processors are not on the same chip. Since the processors communicate with one another through a shared memory, there is no problem except for the occurrence of some overhead.
FIG. 2 illustrates an example of performing related tasks in an asymmetric multi-core processor according to a related art technique.
The asymmetric multi-core processor according to the related art technique includes a power processor element (PPE) 210, an operating system (OS) 220, and a synergistic processor element (SPE) 240.
The PPE 210 indicates a main processor corresponding to a GPP of the present invention, and the PPE 210 serves to drive the OS 220 and to control a system.
The OS 220 serves to communicate with an application program 230 under a control of the PPE 210 or serves to make a schedule for the SPE 240 that is to be described as follows.
The OS 220 drives various device drivers or software 222 and communicates information with the application program 230 through modules 232 and 234. In addition, the OS 220 may directly communicate information with the application program 230 through an API.
The OS 220 controls the SPE 240 through a scheduler 224.
The SPE 240 corresponds to a DSP of the present invention and mainly serves to perform operations, and the SPE 240 receives information from the OS 220 through SPE modules 261 to 264. Also, the SPE 240 performs tasks by using SPE threads 251 to 254 which are received under a control of the OS 220, and the SPE threads 251 to 254 correspond to a DSP context of the present invention.
Referring to FIG. 2, the PPE 210 secures that there are enough SPEs from among the SPE1 241 to SPE7 247 to perform the tasks and allows the SPE1 241 to SPE7 247 to perform the tasks through the scheduler 224 of the operating system 220. In this case, since SPE1 241 to SPE7 247 are on a single core, the tasks are concurrently performed within a predetermined range. Although the SPE1 241 to SPE7 247 exist on the same chip, since the scheduler 224 performs tasks by apportioning the SPE threads 251 to 254 respectively to SPE1 241 to SPE7 247, the particular SPE 240, among the SPE1 241 to SPE7 247, which first receives the SPE threads 251 to 254, first begins to perform the task.
Accordingly, the SPE 240 among the SPEs 241 to 247, which first begins to perform the task, is established.
For example, although tasks A and B have to be concurrently performed as related tasks, it is assumed that data needed for the task A is first loaded in the SPE1 241, and then, the task B is subsequently loaded in the SPE2 242. At this time, when the SPE1 241 transmits data or signals to the SPE2 242 when the task B is not yet loaded in the SPE2 242, since the SPE2 242 does not have corresponding data, the SPE2 242 has to access the main memory.
In this case, since the data or signals have to be transmitted from the chip to the main memory, efficiency is reduced. In addition, when data needed for the task B is loaded in the SPE2 242 and executed late, since information needed for allowing the SPE1 241 to communicate with the SPE2 242 at the time when the task B is performed has to be updated, there is a problem in that the task A is stopped for a long time.
Since the SPEs 241 to 247 are on the same chip in the asymmetric multi-core processor, and it is possible to perform communication among the SPEs 241 to 247 by directly accessing a local memory without passing through a shared memory, like a main memory, the communication speed is high. However, as described above, when the starting times of the related tasks are not at the same time with one another, since the overhead occurs, and the SPEs 241 to 247 have to communicate with one another through the shared memory, the performance of the processor deteriorates.