1. Technical Field of the Invention
The present invention generally relates to software debugging tools. More particularly, and not by way of any limitation, the present invention is directed to a synchronous breakpoint system and method in a high performance multiprocessing environment.
2. Description of Related Art
In spite of the diversity among software program debuggers, they all share a common operational model. In order to fix a bug, the developer executes the program, and then uses a debugger to examine its behavior. Normally, he sets a breakpoint at an address location that is of some significance to the code, the machine on which it is being executed, or both, and the program is launched. When the breakpoint is reached during the runtime, control is returned to the user so that he can single-step forward, trying to delineate what happened in the execution. The user needs to activate the debugger, then run the debugged program and reproduce the problematic behavior.
One of the major problems with this model is that the execution that causes the problem to surface and the execution under the debugger are typically very different. This is analogous to the famous “Uncertainty Principle” in physics as applied to the field of software engineering: the tool that is used to analyze the run actually changes the run characteristics. Unfortunately, this troublesome aspect is particularly vexatious in distributed, client/server, and parallel systems (such as, e.g., multithreaded or multiprocessor systems).
Architecting testable software for high performance computing platforms has accordingly become a daunting task. In today's multiprocessor (MP) systems having a large number of processors in myriad architectural arrangements, the task is even more challenging. Because the teachings of the present invention will be exemplified in particular reference to MP platforms, a brief introduction thereto is immediately set forth below.
In the most general sense, multiprocessing may be defined as the use of multiple processors to perform computing tasks. The term could apply to a set of networked computers in different locations, or to a single system containing several processors. As is well known, however, the term is most often used to describe an architecture where two or more linked processors are contained in a single or partitioned enclosure. Further, multiprocessing does not occur just because multiple processors are present. For example, having a stack of personal computers in a rack is not multiprocessing. Similarly, a server with one or more “standby” processors is not multiprocessing, either. The term “multiprocessing” is typically applied, therefore, only to architectures where two or more processors are designed to work in a cooperative fashion on a task or set of tasks.
There exist numerous variations on the basic theme of multiprocessing. In general, these variations relate to how independently the processors operate and how the workload among these processors is distributed. In loosely-coupled multiprocessing architectures, the processors perform related tasks but they do so as if they were standalone processors. Each processor is typically provided with its own private memory and may have its own mass storage and input/output (I/O). Further, each loosely-coupled processor runs its own copy of an operating system (OS), and communicates with the other processor or processors through a message-passing scheme, much like devices communicating over a local area network. Loosely-coupled multiprocessing has been widely used in mainframes and minicomputers, but the software to do is closely tied to the hardware design. For this reason, among others, it has not gained the support of software vendors and is not widely used in today's high performance server systems.
In tightly-coupled multiprocessing, on the other hand, operation of the processors is more closely integrated. They typically share main memory, and may even have a shared cache. The processors need not be identical to one another, and may or may not perform similar tasks. However, they typically share other system resources such as mass storage and I/O. Additionally, instead of a separate copy of the OS for each processor, they run a single copy, with the OS handling the coordination of tasks between the processors. The sharing of system resources makes tightly-coupled multiprocessing platforms somewhat less expensive, and it is the dominant multiprocessor architecture in the business-class servers currently deployed.
Hardware architectures for tightly-coupled MP platforms can be further divided into two broad categories. In symmetrical MP (SMP) systems, system resources such as memory, disk storage and I/O are shared by all the microprocessors in the system. The workload is distributed evenly to available processors so that one does not sit idle while another is heavily loaded with a specific task. Further, the SMP architecture is highly scalable, i.e., the performance of SMP systems increases, at least theoretically, as more processor units are added.
In asymmetrical MP systems, tasks and resources are managed by different processor units. For example, one processor unit may handle I/O and another may handle network OS (NOS)-related tasks. Thus, it should be apparent that an asymmetrical MP system may not balance the workload and, accordingly, it is possible that a processor unit handling one task can be overworked while another unit sits idle.
SMP systems are further subdivided into two types, depending on the way cache memory is implemented. “Shared-cache” platforms, where off-chip (i.e., Level 2, or L2) cache is shared among the processors, offer lower performance in general. In “dedicated-cache” systems, every processor unit is provided with a dedicated L2 cache, in addition to its on-chip (Level 1, or L1) cache memory. The dedicated L2 cache arrangement accelerates processor-memory interactions in the multiprocessing environment and, moreover, facilitates higher scalability.
As briefly alluded to hereinabove, designing software intended for reliable cross-platform execution on the numerous MP systems available nowadays has become an arduous undertaking. Further, with ever-shrinking design/debug cycle times, software developers are continuously looking for ways to streamline the debug operations necessary to architect well-tested code, be it application software, OS software, or firmware.
In addition, it should be appreciated that oftentimes the hardware development of a particular platform may not have advanced far enough to allow debug testing of the software code targeted for that platform. Typically, an architectural simulator is utilized in such instances. The simulator, which is operable to simulate a target hardware platform, can “execute” a particular piece of software intended for the target hardware as if it were run on the actual machine itself, and is provided with a debugger for debugging the software using the conventional breakpoint methodology as set forth in the foregoing.
Implementing conventional breakpoints for code debugging purposes in multiprocessing environments is beset with several deficiencies, however. In MP systems, the user sometimes desires to set a breakpoint at a particular location but wants control only after all processors have hit that breakpoint. For instance, during the early boot sequence of an MP machine it is useful to have a breakpoint set in the processor synchronization routine, whereby the user regains control only after all the processors have synchronized at the breakpoint. Furthermore, the complex task of debugging in the MP environment is much easier when the processors have attained a known common state.
In the conventional implementation for synchronizing the processors in an MP environment, run control is returned to the user after the first processor hits the breakpoint. As a consequence, the user is required to manually switch to the remaining processors, one by one, and continue with code execution until each of them reaches the same breakpoint. It should be appreciated that such manual switching is highly cumbersome and error-prone, especially where a large number of processors are included in the target hardware platform.