Parallel processing, which generally comprises employing a plurality of microprocessors coupled to the same computer system to concurrently process a batch of data, is of great importance in the computer industry. Generally, there are three major types of parallel processing. These are parallel processing systems employing shared memory or distributed memory or a combination of the two. Typically, shared memory is memory that can be accessed in a single operation, such as a “load” or “read” command, by a plurality of processors. Distributed memory is memory that is localized to an individual processor. In other words, in a distributed system, each processor can access its own associated memory in single access operation, but typically cannot access memory associated with the other processors in a single operation. Finally, there is a hybrid, or “heterogeneous,” parallel processing, in which there is some system memory accessible by one or more processors, and some memory which is distributed and local to at least one processor.
One such example of a hybrid parallel processor system comprises at least one reduced instruction set (RISC) main processor unit (MPU), such as a PowerPC™ processor, and at least one specialized or “attached” processor unit (APU), such as a Synergistic™ APU (SPU). Typically, the MPU is employed to execute general-purpose code, wherein the general purpose code comprises complex control flows and orchestrating the overall hybrid parallel processing function. The MPU has access to the full range of system memory. The APU is generally directed to executing dataflow operations. In other words, the APU calculates highly repetitive multimedia, graphics, signal, or network processing workloads which are identified by high compute to control decision ratios. In conventional hybrid systems, APUs do not have access to the system memory, and their own memory, the local store, is typically smaller than the shared memory.
Generally, while employment of the hybrid system provides high computational performance, it poses significant challenges to the programming model. One such problem relates to the APU. The APU cannot directly address system memory, Therefore, any code to be run on the APU has to be transferred to an associated local storage of the APU before this code can be executed on the APU. This creates problems in the linking/binding process.
To help solve various problems during software design and implementation, programmers employ debuggers. Typically, low-level operations used by a debugger are classified as one of three primitives. A first debugger primitive involves stopping a program at a well-defined location. This requires that the debugger (1) identifies the address associated with a function name, file/line number, or other uniquely identifying source code construct, and (2) setting a break point.
A second debugger primitive concerns mapping a program location to the file/line number, function name or other uniquely identifying source code construct. This requires the debugger to map a memory address to such source construct. The memory address mapped is usually the current address of the program counter PC which involves rereading the value of the PC register by the debugger. As is understood by those of skill in the art the, program counter comprises the memory address of the instruction currently being executed.
A third debugger primitive allows reading and writing of program data. This requires that the debugger identifies the memory address associated with a data object or variable. Typically, setting a breakpoint is used in conjunction with read or write accessing the contents of the address memory location.
Generally, each of the three primitives above comprises a mapping step (1) and an operative step (2). The mapping step identifies the correlation between the executable object code and the source code or some other mapping indicia, whereas the operative step comprises other operations performed by the debugger. To perform the mapping step, debuggers use at least one mapping indicia table and a debugging table originally generated by the compiler and updated by the runtime environment. The mapping table has information associated with the location of each program object, each mapping name, the correlation between file/line numbers and object addresses, the layout of variables, the stack layout, and so on. These mapping indicia tables can, for example, be represented in the form of symbol tables, stabs debugging entries, etc.
Typically, in conventional systems, the mapping is static in nature. In other words, typically, the addresses associated with particular objects are fixed at compile time and are not changed over the course of the execution of a program. However, automatic variables allocated to a stack, wherein the automatic variable are referenced at a fixed and pre-determined offset relative to a changing stack, frame, base or other such stack management pointer typically maintained in a processor hardware register, can be dynamic in nature.
Static implementations of maps for debuggers are not sufficient in a heterogeneous processing system. For instance, as the code and data is loaded and unloaded from the system memory to a local store of an APU, the memory addresses of code and data will change. Furthermore, code stored in the local store of the APU will be overwritten, hence, making not all symbols available at all times.
Therefore, what is needed is a debugger for debugging heterogeneous architecture that overcomes the deficiencies of conventional debuggers.