Multiple Instruction Multiple Data (MIMD) processors are becoming increasingly viable for realizing enhanced performance in embedded applications, particularly for compute intensive applications that do not lend themselves to other types of processors. The number of cores in a MIMD processor has increased over time with a 64-core Tile processors being one example. This raises various issues associated with scalability as the number of processor cores increases.
MIMD designs form the basis of most multicore processors that are not Graphics Processing Units (GPUs). A typical MIMD processor is made up of multiple Central Processing Unit (CPU) cores that are arranged on-chip and interconnected using a high-speed switched network. The MIMD design may or may not be fully cache coherent. The interconnect provides both core-to-core communications as well as data transfers between memory controllers and I/O systems. This hardware design is aimed at sustaining performance scaling as the number of cores increases to tens, hundreds and even thousands of cores.
This type of MIMD architecture requires careful consideration of software design and implementation with respect to parallel execution. The conventional approach to building concurrent software is through the use of multithreaded programming APIs such as those found in Portable Operation System Interface (for Unix) (POSIX)-based operating systems. Nevertheless, multithreaded programming is relatively heavyweight and does not scale well for large numbers of tasks as complexity becomes difficult to manage. One of the key reasons for this is that multithreaded programming APIs provide a very low level of abstraction and put the onus on the developer to ensure that they are used correctly. This is a task that becomes inherently difficult in the many core arena.
An alternative approach to concurrent software development is to use language-level solutions that explicitly provide support for concurrency, reducing program complexity through abstraction. Many of these originated from the High Performance Computing (HPC) arena where there has for many years existed the need to build large-scale concurrent applications. However, these language-level solutions are not suited to embedded systems development because they focus on distributed clusters and use heavyweight runtimes that are only acceptable for long-running jobs.
Closer to the embedded systems community there have been other efforts to provide language-level support for concurrent programming. Examples include Google's GO™ Microsoft's F# ™, Apple's Grand Central Dispatch (GCD)™ and, Intel's Cilk™ and Thread Building Blocks (TBB)™.
One thing that is common to many of these language-level supports for concurrent programming is that they rely on application-managed concurrency in the form of lightweight tasks. Lightweight tasks can be created and scheduled by the application in user-space and do not carry the overhead of broader OS-level attributes such as security permissions and memory allocations. They are the ideal vehicle for realizing fine-grained application concurrency with a low system overhead.
There are a variety of parallel programming solutions for lightweight task management in MIMD multi-core processors. Exemplary solutions include OpenMP™, Cilk™, Intel TBB™ and Apple GCD™.
Language extensions for lightweight task management allow programmers to create thousands or millions of units of concurrent execution that can be simultaneously performed by different threads running on multiple processing cores. Consequently, task management is increasingly becoming a problem in MIMD multi-core processor systems. Therefore, what is desired is improved task management methods and systems.