1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and computer program products for debugging a high performance computing program.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
As computer software has become more sophisticated, the complexity of developing computer software has also increased. Increased complexity of computer software often produces defects in the software that a developer must identify and correct, such as, for example, generating incorrect output or hanging during execution. Generating incorrect data may result from incorrect input or bad data processing. Computer software that hangs during execution most often results from a bad calling sequence among the program subroutines.
When computer software hangs during execution, a developer typically needs to obtain an overview of the state of the entire software program in order to identify the specific cause of the software defect. To obtain such an overview, a developer often utilizes computer software called a ‘debugger.’ A debugger is used to analyze software defects or to optimize performance of other computer software. A debugger allows a user to follow the flow of program execution and inspect the state of a program at any point by controlling execution of the program being debugged. A debugger typically allows a user to track program variables, execute a thread of execution step by step, stop execution of a thread at a particular line number in computer source code, stop execution of a thread when certain conditions are satisfied, or examine a thread's calling sequence of subroutines.
Current debuggers adequately aid a developer in debugging a computer software program composed of a relatively small number of threads of execution such as, for example, software programs executing on single processor or small multi-processor computer systems. Current debugger, however, do not provide a developer an efficient mechanism for debugging a special class of computer software programs called high performance computing programs. A high performance computing program is a computer software program composed of a massive number of threads of execution. Typically, each thread of a high performance computing environment executes on a dedicated processor such that the threads of a high performance computing program execute in parallel on a massive number of processors to solve a common problem. Current debuggers do not provide adequate means of debugging these high performance computing programs because these debuggers are not aware that the threads of a high performance computing program often perform similar operations. Consequently, current debuggers require a developer to manually sort through individual threads of execution to identify the defective threads. Often a high performance computing program may, however, contain over one hundred thirty thousand threads of execution such as, for example, a high performance computing program executing on the IBM® BlueGene/L supercomputer. Such a high performance computing program makes manually identifying defective threads a near impossible task.
In response to the challenges associated with debugging a computer program composed of numerous threads of execution, some current debuggers implement the concept of thread groups based on the type classification of the thread under execution. In typical high performance computing programs, however, most of the threads have the same “worker-thread” type classification. As such, the benefits of having groups based on a type classification of a thread often do not accrue to developers debugging a high performance computing program.