1. Field of the Invention
The present invention generally relates to a parallel computer system. More specifically, the present invention relates to a debug method for eliminating bugs peculiar to a parallel processing operation from a user program for each processor element, and to a system for the same.
2. Description of the Related Art
To improve calculation speed, various parallel computer systems for simultaneously operating plural processor elements have been developed. In such conventional systems, there are two types of parallel computer systems. In one type of parallel computer system, all processor elements are connected to a shared memory via which data is transferred. In the other type of parallel computer system, each processor element has a local memory unit and directly transfers data to other processor elements via a network. In the former, synchronization control is performed for controlling an order accessing to the shared memory upon transmission and reception of the data, and in the latter, data transmission and reception control is performed for controlling an order of transmission of data by another processor element and reception of the data by the processor element.
The conventional computer languages such as a FORTRAN language, or specific computer languages are prepared as user interfaces in the parallel computer systems. Bugs peculiar to the parallel processing operation may be introduced into a program when the program is coded by utilizing such a computer language. These bugs, peculiar to the parallel processing operation, are usually faults of the synchronization control, e.g., lack of a control statement that the plurality of processor elements simultaneously define or use the same address of a shared memory and faults of the data transmission and reception control or errors of an algorism introduced into a program when the program is rewritten for a distributed memory system. In particular, when a parallel processing operation is performed without the synchronization control, an execution order of respective processing sub-operations is not insured so that the execution order is different every time the parallel processing operation is carried out, and hence the reproducibility cannot be insured.
Although there is no definitely complete debugging method for such parallel computer systems, several debugging methods have been so far proposed in order to eliminate these bugs from the programs. One product named as "Pdbx Parallel Debugger" is commercially available from SEQUENT COMPUTER SYSTEMS INC. The debugging method of this debugger is described in a leaflet "Pdbx Parallel Debugger for Balance Computer Systems" thereof. In this product, there is a means for causing an operator to recognize a situation of the parallel processing operation by outputting trace data on the synchronization control of "post/wait" during a performance of the parallel processing operation. In addition, the product has a function for interrupting the parallel processing operation on the way. In order to debug programs including faults of synchronization control as bugs in a stable and reproducible state, the bugs are desirably detected from trace data produced by sequential execution of the programs, not parallel execution. A method for sequentially performing the debug processing is disclosed in the reference JP-1-106234. However, in the reference, the debug processing is performed once in parallel to output trace data and the order of the debug processing to be then sequentially performed is determined by use of the trace data. Thus, in this conventional method, the debug processing must be performed in parallel before this processing is sequentially performed.
In general, since execution statements on the data transmission and reception or the synchronization control are described in subprograms of a user program for a parallel computer system, when these subprograms are sequentially executed, the parallel computer system is brought into a data waiting state in which a processor element in the system is left waiting for data that is not being transmitted so that the processor element cannot proceed an operation. As a result, there is a problem in that the execution of the user program cannot be proceeded regardless of the user program having no bug.