1. Field
This disclosure relates generally to debugging code and, more specifically, to techniques for debugging code during runtime.
2. Related Art
Low-level application programming interface (LAPI) is a component of the AIX® operating system implementation of reliable scalable cluster technology (RSCT). LAPI is a message-passing application programming interface (API) that provides a one-sided communication model. In the LAPI communication model, a first task initiates a communication operation to a second task and the completion of the communication does not require the second task to take a complementary action. A LAPI library provides basic operations that facilitate storing data to and retrieving data from one or more virtual addresses of a remote task. LAPI provides a message infrastructure that allows a programmer to install a set of handlers that are called and run in an address space of a target task on behalf of a task originating a message. LAPI may provide flow control and support for: large messages; generic non-contiguous messages; non-blocking calls, interrupt and polling modes, efficient exploitation of network switch functions, and event monitoring (to simulate blocking calls, for example) for various types of completion events.
A programmer may interact with LAPI through an object called a LAPI handle (also referred to as a LAPI instance or a LAPI context). Usually, LAPI function calls take a LAPI handle as a first argument. LAPI provides a number of methods for transferring non-contiguous data (such as multiple buffers), repeating block/stride descriptions, and implementing data gather/scatter programs (DGSPs). For LAPI communication operations, origin (or source) denotes a task that initiates a LAPI operation and target (or destination) denotes a task where an address space is accessed during the operation. A push operation transfers data from the origin task to the address space of the target task. A pull operation transfers data from the address space of the target task into the address space of the origin task.
LAPI can be run in either a polling mode or an interrupt mode. In the polling mode, the sending and receiving of messages only happens when a programmer explicitly calls a LAPI function. In the interrupt mode, a receive interrupt is generated for incoming messages. In general, a control thread, which LAPI creates at initialization, handles interrupts. Using the LAPI query function, a programmer can query statistics related to data that is transferred using a user space (US) protocol or a user datagram protocol/Internet protocol (UDP/IP), through intra-task local copy or shared memory. LAPI includes a profiling interface that has wrappers (for each LAPI function) that facilitate collecting data about each LAPI call.
Compared to other communication protocols, such as the message passing interface (MPI) and the Internet protocol (IP), LAPI provides a lower level interface to a network switch. As is known, MPI is a specification for an API that allows computers to communicate with one another. The MPI may be, for example, employed in computer clusters and supercomputers. Applications that are written using LAPI can be run over a cluster of processors running, for example, AIX 5L® or Linux™.
LAPI subroutines provide a wide variety of functions that can be used to obtain most behaviors required from a parallel programming API. LAPI functions also usually provide: C and FORTRAN subroutine bindings; extern “C” declarations for C++ programming; and profiling interfaces for C, C++, and FORTRAN programs. Complementary functions may be implemented to provide for checking completion of operations and for enforcing relative ordering, if desired. Additionally, LAPI functions allow tasks to exchange addresses that will be used in LAPI operations.
LAPI functions (and related subroutines) include: functions to initialize and terminate LAPI; functions to query and set up a runtime environment; and address-related functions. LAPI uses a number of internal structures to enable LAPI to perform message-passing operations on behalf of a user. As one example, the LAPI_Init subroutine is used to allocate memory for LAPI communication structures and to initialize the communication structures. The LAPI_Init subroutine returns a unique handle that represents a single LAPI communication context. The handle is subsequently passed as a parameter to each of the other LAPI functions. The LAPI_Init subroutine reads in various environment variables and sets up various communication channels based on the values of the variables.
As another example, the LAPI_Term subroutine is used to free memory associated with LAPI communication structures. The LAPI_Term subroutine takes a LAPI handle as a parameter and uses the handle to terminate the corresponding communication context. Once the LAPI_Term subroutine is called, no farther LAPI communication can be performed on a handle that has been terminated. Typically, the LAPI_Init subroutine is called once at the beginning of a user program and the LAPI_Term subroutine is called just before the user program terminates.
A number of different variables constitute a LAPI runtime state. Many of the variables can be queried at runtime to affect operation of a user program. For example, it is often useful to know the number of tasks in a given job as well as the identity of a current task and to design a user program to take actions according to the values. Many LAPI runtime state variables can also be set to alter LAPI behavior through job execution and to tune LAPI performance. For example, it may be useful to turn off interrupts to signal incoming packets when a user program explicitly makes a number of calls to various LAPI progress routines.
Various tasks implemented with a LAPI framework may require debugging. Traditionally, debugging code has been difficult without adding traces to an application and shared libraries that the application invokes. Moreover, debugging tasks implemented within a LAPI framework has usually required halting and restarting the task during debugging. While a signal (which is a limited form of inter-process communication used in Unix, Unix-like, and other portable operating system interface for Unix (POSIX) compliant operating systems) may be used to initiate debugging, when a signal is sent to a process an operating system interrupts the normal flow of execution of the process. Furthermore, signal handling is limited in that an invoked function cannot pass arguments (or parameters). Moreover, a signal handler is limited in that only a single signal can be handled by a single registered function. Furthermore, when conventional signal handlers are employed, adding new functions to an application without re-compiling the application is complicated. While the problems associated with using conventional signal handlers may be addressed using environment variables or files, the use of an environment variable requires restarting of an application in order for a new environment variable to take effect. In addition, employing files can cause resource/lock issues.
In computer programming, a callback function is executable code that is passed as an argument to other code. A callback function allows a lower-level software layer to call a subroutine or function that is defined in a higher-level layer. In general, callback functions have a variety of uses. For example, writing a function that reads a configuration file and associates values with options (which are identified by a hash) as a callback function makes the function more flexible. In this case, a user of the function can use any desired hashing algorithm and the function will continue to work as the function uses the callback to turn option names into hashes. Another use of callbacks is in error signaling. A programmer may register a clean-up function as a callback function when the programmer does not want a program to terminate immediately (to ensure things get taken care of) when the program receives a signal. Callback functions may also be used to control whether a function acts.