Parallel processing, which generally comprises employing a plurality of microprocessors coupled to the same computer system to concurrently process a batch of data, is of great importance in the computer industry. Generally, there are three major types of parallel processing. These are parallel processing systems employing shared memory or distributed memory, or a combination of the two. Typically, shared memory is memory that can be accessed in a single operation, such as a “load” or “read” command, by a plurality of processors. Distributed memory is memory that is localized to an individual processor. In other words, in a distributed system, each processor can access its own associated memory in single access operation, but typically cannot access memories associated with the other processors in a single operation. Finally, there is a hybrid, or “heterogeneous,” parallel processing, in which there is some shared memory among one or more processors, also known as system memory, and some memory which is distributed and local to at least one processor.
One such example of a hybrid parallel processor system comprises at least one reduced instruction set (RISC) main processor unit (MPU), such as a PowerPC™ processor, and at least one specialized or “attached” processor unit (APU), such as a Synergistic™ APU (SPU). Typically, the MPU is employed to execute general purpose code, wherein the general purpose code comprises complex control flows and orchestrating the overall hybrid parallel processing function. The MPU has access to the full range of system memory. The APU is generally directed to executing dataflow operations. In other words, the APU calculates highly repetitive multimedia, graphics, signal, or network processing workloads which are identified by high compute to control decision ratios. In conventional hybrid systems, APUs do not have access to the system memory, and their own memory, the local store, is typically smaller than the shared memory.
Generally, while employment of the hybrid system provides high computational performance, it poses significant challenges to the programming model. One such problem relates to the APU. The APU cannot directly address system memory. Therefore, any code to be run on the APU has to be transferred to an associated local storage of the APU before this code can be executed on the APU. Furthermore, the APU and the MPU can have different instruction sets.
Furthermore, additional issues exist pertaining to the debugging of software that is to be compiled and linked to run in separate execution environments. To help solve various problems during software design and implementation, programmers employ debuggers. Typically, low-level operations used by a debugger are classified as one of three primitives. A first debugger primitive involves stopping a program at a well-defined location. This requires that the debugger (1) identifies the address associated with a function name, file/line number, or other uniquely identifying source code construct, and (2) setting a break point.
A second debugger primitive concerns mapping a program location to the file/line number, function name or other uniquely identifying source code construct. This requires the debugger to map a memory address to such source construct. The memory address mapped is usually the current address of the program counter PC which involves rereading the value of the PC register by the debugger. As is understood by those of skill in the art the, program counter comprises the memory address of the instruction currently being executed.
A third debugger primitive allows reading and writing of program data. This requires that the debugger identifies the memory address associated with a data object or variable. Typically, setting a breakpoint is used in conjunction with read or write accessing the contents of the address memory location.
Generally, each of the three primitives above comprise a mapping step and an operative step. The mapping step identifies the correlation between the executable object code and the source code, whereas the operative step comprises other operations performed by the debugger. To perform the mapping step, debuggers use a mapping table and debugging tables originally generated by the compiler, and updated by the linker, describing the location of each program object, each label, the correlation between file/line numbers and object addresses, the layout of variables, the stack layout, and so on.
Typically, setting a breakpoint for a debugger in non-heterogeneous architectures occurs in one of two ways. The first way is to replace a selected instruction or data at a “breakpoint” with a trap instruction, or other such sequence which will halt normal execution of the program and transfer control to the debugger. The second way is to initialize a breakpoint register with the address value (or address range) of the breakpoint. The hardware compares the program counter, that is, the register containing the address of the instruction that is executed, with the value of one or more breakpoint registers. If the values match, a transfer of control to the debugger occurs when the value in the breakpoint register matches the program counter. Matches can include a variety of matching functions, such as “equal,” “falls in a range,” “less,” “greater,” or other Boolean functions.
However, setting a breakpoint in a heterogeneous architecture can be more complicated. For instance, the instruction sets of the separate processor components can differ. This can create problems in debugging. For instance, the code configurations of a module can differ, depending upon whether it is loaded to the second execution environment. Furthermore, when a breakpoint is set in a module which can be loaded and unloaded, it is important to maintain breakpoints correctly across such loading and unloading activity in correspondence with the requested breakpoints. Thus, when a module is replaced by another module, and later reloaded, care must be taken that all breakpoints are maintained when breakpoints have been set.
Therefore, what is needed is a debugger for employment in heterogeneous parallel processing systems that overcomes the limitations of conventional debuggers.