Parallel processing, which generally comprises employing a plurality of microprocessors coupled to the same computer system to concurrently process a batch of data, is of great importance in the computer industry. Generally, there are three major types of parallel processing. These are parallel processing systems employing shared memory or distributed memory or a combination of the two. Typically, shared memory is memory that can be accessed in a single operation, such as a “load” or “read” command, by a plurality of processors. Distributed memory is memory that is localized to an individual processor. In other words, in a distributed system, each processor can access its own associated memory in single access operation, but typically cannot access memory associated with the other processors in a single operation. Finally, there is a hybrid, or “heterogeneous,” parallel processing, in which there is some system memory accessible by one or more processors, and some memory which is distributed and local to at least one processor.
One such example of a hybrid parallel processor system comprises at least one reduced instruction set (RISC) main processor unit (MPU), such as a PowerPC™ processor, and at least one specialized or “attached” processor unit (APU), such as a Synergistic™ APU (SPU). Typically, the MPU is employed to execute general purpose code, wherein the general purpose code comprises complex control flows and orchestrating the overall hybrid parallel processing function. The MPU has access to the full range of system memory. The APU is generally directed to executing dataflow operations. In other words, the APU calculates highly repetitive multimedia, graphics, signal or network processing workloads which are identified by high compute to control decision ratios. In conventional hybrid systems, APUs do not have access to the system memory, and their own memory, the local store, is typically smaller than the shared memory.
Generally, while employment of the hybrid system provides high computational performance, it poses significant challenges to the programming model. One such problem relates to the APU. The APU cannot directly address system memory. Therefore, any code to be run on the APU has to be transferred to an associated local storage of the APU before this code can be executed on the APU. Furthermore, the APU and the MPU can have different instruction sets.
However, in the program design process, conventional compilers generally do not assign information sequences, such as specific code sequences or data, to be run on a PU or an APU in a hybrid system. Instead, programmers determine how to allot code functionality to the APU or the MPU. This allotment of code to a processor typically entails inefficiencies in the programming process. Furthermore, there is no standard programming “tool box” for passing information, be it text (that is, code) or data, between the attached processor and the main processor. Therefore, programmers typically have no standard format for passing this information sequences, thereby also creating inefficiencies in the programming process.
Therefore, what is required is a programming environment that allows for systematized programming of an MPU and an APU, and the transference of code and data between the MPU and the APU, that overcomes the deficiencies of conventional systems.