Parallel processing, which generally comprises employing a plurality of microprocessors coupled to the same system to concurrently process a batch of data, is of great importance in the computer industry. Generally, there are three major types of parallel processing. These are parallel processing systems employing shared memory or distributed memory or a combination of the two. Typically, shared memory is memory that can be accessed in a single operation, such as a “load” or “read” command, by a plurality of processors. Distributed memory is memory that is localized to an individual processor. In other words, each processor can access its own associated memory in single access operation, but typically cannot access memories associated with other processors in a single operation. Finally, there is a hybrid, or “heterogeneous”, parallel processing, in which there is some shared memory and some memory which is distributed.
One such example of a hybrid parallel processor system comprises a reduced instruction set (RISC) main processor unit (MPU), such as a PowerPC™ processor, and a specialized, or “attached” processor (APU), such as a Synergistic™ APU (SPU). Typically, the MPU is employed to execute general-purpose code, wherein the general-purpose code comprises complex control flows and orchestrating the overall hybrid parallel processing function. The MPU has access to the full range of system memory. Although in one embodiment, only one MPU is used, in other embodiments, more than one MPU is used. The APU is generally directed to executing dataflow operations. In other words, the APU calculates highly repetitive multimedia, graphics, signal, or network processing workloads, which are identified by high compute to control decision ratios. In conventional hybrid systems, APUs do not have access to the system memory, and their own memory, the local store, is typically smaller than the shared memory.
Generally, while employment of the hybrid system provides high computational performance, it poses significant challenges to the programming model. One such problem relates to the APU. The APU cannot directly address system memory. Therefore, any code to be run on the APU has to be transferred to an associated local storage of the APU before this code can be executed on the APU. Furthermore, the APU and the MPU can have different instruction sets.
In processing systems, such as hybrid processing systems, there is a need to transfer data between different components (for example, subroutines or functions) of the program. If these subroutines are designed to execute on a processor with direct access to the system memory, or are designed to execute entirely on a single processor within the heterogeneous computer system, conventional approaches of resolving the address of global data by a binder or linker functionality can be used. As is understood by those of skill in the art, global data is generally defined as data, which is referenced by a plurality of subroutines.
However, in the cases where communication of global data is required between subroutines executing on separate APUs, wherein the APUs have their own local store, or between a combination of one or more APUs and one or more MPUs, conventional linkage mechanisms are not capable of supporting the capacity of referencing global variables. In a conventional heterogeneous multi-processor system, there are several locations in which global data might reside, and these locations are not uniformly accessible from subroutines executing on the different processors within the system. However, an integrated executable program will typically need to access such global variables from more than one of the different processors within the system.
Therefore, there is a need to access global variables in a hybrid parallel processing system that overcomes the limitations of conventional systems.