An error or “bug” in a computer program (i.e., executable source program) is one that causes the computer program to malfunction in some way. Debugging refers to the process in which the errors in the computer program are found and removed. Finding an error in a computer program running in a massively parallel processing (MPP) environment can be extremely difficult and time intensive. Before further discussing this problem, an overview of an MPP environment is provided using FIG. 1.
MPP environments are computer environments that operate using a massive number of processors. It is typical for an MPP environment to use tens of thousands of processors. Each processor in such an environment is able to execute computer instructions at the same time, which results in a very powerful system because many calculations take place simultaneously. Such an environment is useful for a wide variety of purposes. One such purpose is for the software simulation of a hardware logic design.
Large logic simulations are frequently executed on parallel or massively parallel computing systems. For example, parallel computing systems may be specifically designed parallel processing systems or a collection, referred to as a “farm,” of connected general purpose processing systems. FIG. 1 shows a block diagram of a typical parallel computing system (100) used to simulate an HDL logic design. Multiple processor arrays (112a, 112b, 112n) are available to simulate the HDL logic design. A host computer (116), with associated data store (117), controls a simulation of the logic design that executes on one or more of the processor arrays (112a, 112b, 112n) through an interconnect switch (118). The processor arrays (112a, 112b, 112n) may be a collection of processing elements or multiple general purpose processors. The interconnect switch (118) may be a specifically designed interconnect or a general purpose communication system, for example, an Ethernet network.
A general purpose computer (120) with a human interface (122), such as a graphical user interface (GUI) or a command line interface, together with the host computer (116) support common functions of a simulation environment. These functions typically include an interactive display, modification of the simulation state, setting of execution breakpoints based on simulation times and states, use of test vectors files and trace files, use of HDL modules that execute on the host computer and are called from the processor arrays, check pointing and restoration of running simulations, the generation of value change dump files compatible with waveform analysis tools, and single execution of a clock cycle.
The software simulation of a hardware logic design involves using a computer program to cause a computer system to behave in a manner that is analogous to the behavior of a physical hardware device. Software simulation of a hardware logic design is particularly beneficial because the actual manufacturing of a hardware device can be expensive. Software simulation allows the user to determine the efficacy of a hardware design. Software simulation of a hardware logic design is well-suited for use in an MPP environment because hardware normally performs many activities simultaneously.
When simulating a hardware logic design in an MPP environment, or executing any other type of computer program in such an environment, debugging the program may become necessary. Properly performed debugging of the computer program reduces the probability of errors that could result in a malfunction. In the case of hardware logic design simulation, such an error might result in the eventual fabrication of computer hardware that does not work as expected. Such a malfunction is expensive and wasteful, so debugging plays an important role.
One common method for debugging is to single-step the execution of the computer program. Each step represents an instruction executed on a processor of the computer. At each step, the state of the simulation system, including the variables and registers, is examined. By examining the state of the simulation system at each progressive step, the person debugging the program is able to inspect the program and determine precisely where the problem begins to manifest itself. Once this is known, the person is better able to correct the program and remove the bug.
Another common method for debugging is to insert a breakpoint into the program so execution of the program stops at the inserted breakpoint. This is similar to single-stepping, but the breakpoint is used to specify a specific place to stop execution and examine the state of the simulation system. Breakpoints may normally be inserted at any instruction in a sequence of instructions. At the breakpoint, a determination may be made if there is a problem with the program at that point. By changing the breakpoint, the manifestation of the problem may be precisely found and can then be corrected.
Single-stepping a program or performing a breakpoint in an environment where there are tens of thousands of parallel processors can be extremely difficult. In particular, MPP environments include a massive number of parallel processors, each executing instructions simultaneously. There is no effective way to synchronously halt a massive number of processors executing simultaneously. In particular, to halt all of the processors requires a global signal to be sent to all of the processors. The time the global signal takes to propagate through the system to reach each of the processors differs depending on the distance the signal has to travel. Thus, some of the processors in the system surpass the intended stopping point where the global signal attempted to stop the processors, which makes debugging impossible. Thus, clock skew and speed of light considerations prevent gated clocks and global control systems from being used.