This application addresses the problem of devoting data processing capability to a variety of user applications while providing efficient use of hardware resources and electric power. An initial response to a need for greater data processing capability is to operate the central processing unit at higher speeds. Increasing the rate of operation of a central processing unit enables greater data processing operations per unit time. This is not a complete solution because memory speed often cannot keep pace with processor speed. The mismatch of processor speed and memory speed can be minimized using memory cache, but such memory cache introduces other problems. Often high processor speeds require deep pipelining. Deep pipelining extends the processing time required to process conditional branches. Thus increased processor speed can achieve only limited improvement. Another potential response is multi-processing. The central processing unit and at least some auxiliary circuits are duplicated. Additional data processor cores enable greater data processing operations per unit time.
Moving from a uni-processor system to a multi-processor system involves numerous problems. In theory providing additional data processor cores permits additional data processing operations. However, proper programming of a multi-processor system to advantageously exploit additional data processor cores is difficult. One technique attempting to solve this problem is called symmetrical multi-processing (SMP). In symmetrical multi-processing each of the plural data processor cores is identical and operates on the same operating system and application programs. It is up to the operating system programmer to divide the data processing operations among the plural data processor cores for advantageous operation. This is not the only possible difficulty with SMP. Data processor cores in SMP may operate on data at the same memory addresses such as operating system file structures and application program data structures. Any write to memory by one data processor core may alter the data used by another data processor core. The typical response to this problem is to allow only one data processor core to access a portion of memory at one time using a technique such as spin locks and repeated polling by a data processor not currently granted access. This is liable to cause the second data processor core to stall waiting for the first data processor core to complete its access to memory. The problems with sharing memory are compounded when the identical data processor cores include caches. With caches each data processor core must snoop a memory write by any other data processor core to assure cache coherence. This process requires a lot of hardware and takes time. Adding additional data processor cores requires such additional resources that eventually no additional data processing capability is achieved by such addition.
Another multi-processing model is called the factory model. The factory model multi-processing requires the software developer to manually divide the data processing operation into plural sequential tasks. Data processing then flows from data processor core to data processor core in the task sequence. This division of the task is static and not altered during operation of the multi-processor system. This is called the factory model in analogy to a factory assembly line. This factory model tends to avoid the data collisions of the SMP model because the data processor cores are working on different aspects of the data processing operation. This model tends to work best for data flow operations such as audio or video data streaming. This factory model is often used in digital signal processing (DSP) operations which typically have many of these data flow operations. There are problems with this factory model as well. The task of dividing the data processing operation into sequential tasks is generally not simple. For even loading of the data processor cores is required to best utilize this factory model. Any uneven loading is reflected in one or more data processor cores being unproductive while waiting for data from a prior data processor core or waiting for a next data processor core to take its data output. The nature of the data processing operation may preclude even loading of the plural data processor cores. Processes programmed using the factory model do not scale well. Even small changes in the underlying data processing operation to be performed by the system may require complete re-engineering of the task division.