This disclosure relates generally to the field of heterogeneous computing architectures.
A heterogeneous computing architecture is a system that comprises multiple architecture types, which may include processors of multiple types. An example of a heterogeneous architecture is the Cell Broadband system marketed by IBM (see http://www-03.ibm.com/technology/cell/index.html for more information). Cell Broadband runs on a single chip multiprocessor containing a PowerPC Processor (PPU) that may run the operating system (OS) and applications, and a set of eight Synergistic Processing Units (SPUs) which are optimized for running computationally intensive applications. The SPU processors are lightweight specialized processors with limited hardware resources. SPUs do not have traditional caches, instead relying on a small (256 KB) directly addressable local store (LS) to manage memory transfer between the SPU chip and main memory of the multiprocessor.
A software application for a heterogeneous architecture may comprise software for the different processor types. The software for the different types of processors may require separate compilation. Software written for a heterogeneous architecture may be compiled by subdividing the heterogeneous software application into homogeneous portions for each processor type. A programmer may be responsible for separating the program portions for each type of processor; the separated program portions are then compiled separately, and the resulting object files are linked into the final executable program. Alternatively, an advanced compiler for a heterogeneous architecture may compile a heterogeneous program by automatically partitioning the program into portions for the different types of processors. For example, a programmer writing software for the Cell Broadband system may identify which program portions contain code to run on the SPUs, and the compiler may use the programmer's annotations to automatically partition the code into PPU and SPU compilation units. One method of annotating the program is to use directives to indicate which program portions are to be run in parallel on the SPUs, for example, OpenMP directives.
An SPU code region may be outlined into a separate procedure. The outlined procedure may be indirectly invoked from its original parent procedure by inserting a call to a runtime system into the parent procedure; the runtime system will then call the outlined procedure. The calling procedure may be referred to as a parent procedure, and the outlined procedure may be referred to as a child of the parent procedure. The outlined child procedure is nested within the scope of the parent procedure.
In a program compiled for an homogeneous architecture any variable allocated on the stack of a parent procedure may be referenced by an outlined nested child procedure. That is, in a homogeneous program, the outlined nested child procedure has implicit access to its parent procedure's stack frame. However, in a heterogeneous program, the outlined procedure needs to be executed on a different processor type than the processor type which executes the parent procedure; therefore, the outlined procedure needs to be separated from its parent procedure and compiled into a separate compilation unit, while maintaining access to the stack variables of its parent procedure.