1. Field of the Invention
The present invention relates to a multicore interface with dynamic task management capability and a task loading and offloading method thereof.
2. Description of Related Art
As the rapid development of the communication and multimedia applications, the tasks supported by various electronic products in the market tend to be diversified, and accordingly, the operation complexity for the electronic products to process such tasks is also greatly increased. Taking the most popular electronics, the cell phone, for example, besides the basic communication function, it is further integrated with functions of a digital camera, a multimedia player, and even a global positioning system (GPS) and so on.
In order to meet such a high operation requirement and maintain a certain upgrade flexibility, dual-core or multicore heterogeneous processors have been widely accepted as an effective solution. As for the common dual-core processors, one control-oriented micro processor unit (MPU) is used to process tasks such as user interface and interrupt handling. On the other hand, one digital signal processor (DSP) is used together to deal with the real-time tasks with low power consumption and high efficiency, as well as regular computation characteristics, such as fast Fourier transform (FFT) and matrix multiplication.
Such heterogeneous multicore platform combines the advantages of different processors such as MPU and DSP, and thus it achieves a much higher computation efficiency than a single processor, and offers a high design flexibility for providing product differentiation with software. However, due to the lack of relevant development tools and corresponding software abstraction concept, in the early stage of the development of application systems for the heterogeneous multicore platforms, each processor is developed independently. For example, the designers may first design a DSP application module (e.g., developing a DSP-based audio-visual codec), and design/verify the corresponding software, and consider the module as a closed sub-system. Then, the DSP module is communicated through accessing peripheral devices (e.g., hardware codec, accelerator) by the MPU. However, there is no direct interaction between the processors.
Furthermore, in order to follow the trend of increasingly multi-tasked and multithreaded applications, an opportunity for a plurality of different tasks or threads to share the DSP computation resources becomes increasingly high. In addition, in order to enhance the computation efficiency, reduce the requirements for storage resources (e.g., scratchpad SRAM or cache) of the DSP computation, or reduce the priority inversion time of a non-preemptive system, the DSP system tends to perform a further task slicing on the computation operations.
The above factors enable the DSP program development to be further abstracted, and software abstraction hierarchy of the traditional MPU subsystems are further added, such as dynamic task loading and offloading, memory management, multi-task processing and dynamic task scheduling, and interrupt handler. However, it is not easy to further abstract the DSP program development. For example, the DSP is not suitable for processing the control-oriented tasks, since it has a high cost in context switch. Therefore, a special communication interface is expected to be developed between the MPU and the DSP, instead of merely using an abstraction software hierarchy of the DSP, which also provides an identical interface to the MPU.
Currently, most of the relevant products commonly available from the market employ mailbox-abstracted, interrupt-driven inter-processor communications, and a μ-kernel abstracted DSP software hierarchy. DaVinci from Texas Instruments and open multimedia applications platform (OMAP) are both application program interface (API) specifications with a DSP Gateway or DSP/BIOS used to connect an entirely-wrapped IPC mechanism, DSP/BIOS, DSP μ-kernel, and eXpress DSP Algorithm Interoperability Standard (xDAIS).
The above software architecture may be substantially represented by the open source software architecture being developed currently. FIG. 1 shows a conventional open source software architecture. Referring to FIG. 1, in this open source software architecture, a software abstraction level of an MPU 110 is moved to a DSP 120, and interrupt-driven inter-processor communications are employed, which, however, would seriously influence the efficiency of the DSP subsystem. Taking the Framework from Texas Instruments for example, there is a significant performance degradation (over 50%) between the efficiency data of the codec indicated in the application notes (including the common H.264, MPEG-2, AAC, MP3, G.71x, and so on) and the hand-optimized version thereof, due to the reasons listed below.
1. The DSP architecture design has been optimized for predictable computations with a high repetitiveness, but it requires a high cost for program control and interrupt handler.
2. The DSP is built in with a large number of registers for processing a large sum of data stream, but its built-in data memory may include no abstraction level of a cache to achieve execution predictability, and as a result, such a design architecture has a higher cost for the context switch.
3. The DSP generally includes function modules for special usage such as a bit-manipulation unit, Galois-field arithmetic unit, and in this manner, it is a waste of resources to execute simple logic computations in the μ-kernel with such high-cost processor.
In view of the above problems, some primary solutions have been proposed, such as Blackfin DSP architecture with an enhanced program control and interrupt handler mechanism developed jointly by Analog Devices Corporation and Intel Corporation, which is even alleged to be able to replace the MPU to become a sole system processor core in a low cost system. However, this architecture not only makes hardware investment the same as the hardware resource of the MPU to strengthen the program control and interrupt handler, but makes software investment of the same software resources, such as replanting the system software, driver, legacy, and other applications of ARM/MIPS and X86MPU of the original MPU.
As such, one way is to analyze applications with compiler techniques, which only allows its processing unit to take preemption in a relatively small context. Another way is to employ many sets of descriptors, so as to reduce the overheads of the DSP on the context switch. However, the disadvantage of the above manners lies in requiring a great deal of static analysis, and the complexity of the program control is increased as well.
A DSP from Philips Corporation provides two instruction sets, in which one is a normal instruction set, and the other is a compact instruction set. The compact instruction set only allows accessing a part of the resources in the DSP such as a few registers. After an interruption occurs, if the interrupt service routine (ISR) only uses instructions from the compact instruction set, the cost of making a context switch is greatly reduced. However, as the instruction length of the compact instruction set is relatively short, only a part of the resources of the DSP can be accessed, and accordingly, the execution efficiency would also be influenced.
AMD Corporation proposes to reserve a set of registers to be used in program sections that would not be interrupted (e.g., interrupt service routine (ISR)). If other registers would be used in the ISR, the values may be first stored in the reserved registers, and then stored back to the original registers after processing the ISR, such that the time required for context switch may be reduced. However, the disadvantage of this manner lies in that the cost of an additional set of registers is required.