1. Field of the Invention
The present invention relates to a LAN analyzer and a method of analyzing a communication procedure, applied to a computer system in which a plurality of computers connected to a transmission line of a LAN (Local Area Network) execute information processing respectively allocated to the computers, while using a message passing through the transmission line, and more particularly to a LAN analyzer and a method of analyzing a communication procedure, in which, when an operational error occurs, the information processing for the same information is reproduced and the operation program is debugged.
2. Description of the Related Art
A distributed computer system has been developed, in which a plurality of computers execute a single piece of complicated information processing in a shared manner.
FIG. 1 is a block diagram showing a distributed computer system in which three computers are connected to the transmission line of a LAN. FIG. 2 shows an example of programs respectively set in the three computers shown in FIG. 1.
Referring to FIGS. 1 and 2, first to third computers 12a to 12c are connected to a transmission line 11. Programs 3a to 3c, for executing jobs allocated to the first to third computers 12a to 12c, are stored in memories (not shown) of the first to third computers 12a to 12c, respectively. Since a single piece of information processing is divided into individual jobs and allocated to the first to third computers 12a to 12c, the jobs allocated to the computers 12a to 12c are associated with one another. Therefore, a single piece of information processing is executed by the entire computer system, while various messages are being exchanged among the computers 12a to 12c.
To execute the information processing, each of the programs 12a to 12c, stored in the first to third computers, includes a send command to transmit a message from itself to another computer (e.g., SEND to N-th Computer, where N is the number of a destination computer).
An operation of the above-described computer system will be described with reference to FIG. 3, which shows sequence of messages output to the transmission line.
When the above computer system is activated to start, the first to third computers 12a to 12c transmit, to the transmission line 11, messages including information on a destination computer and a sender computer, in accordance with the proceeding of the programs 3a to 3c. Each of the first to third computers 12a to 12c analyzes messages on the transmission line 11 and receives a message which is addressed to itself. Accordingly, messages are output to the transmission line in a sequence of (1).fwdarw.(5) as shown in FIG. 3.
In the information processing as described above, whether the programs (3a to 3c shown in FIG. 2) incorporated in the computers are correctly executed is confirmed prior to actual execution of the programs. If an error is detected, the programs should be debugged.
In a debugging process, in general, a program is reactivated to repeat an error, and then comprehend and analyze the error. However, in a computer system in which the jobs in the process are allocated to a plurality of computers, even when the process is repeated to cause the same error, it is difficult to repeat the same process, since an execution path could have been changed due to variation in the computer loads and the network load.
For example, assuming that message passing is executed among the first to third computers 12a to 12c in the sequence of (1).fwdarw.(5) of FIG. 3, if the same information processing is replayed, the steps (2) and (4) may be exchanged, i.e., the sequence may be changed to (1).fwdarw.(4).fwdarw.(5).fwdarw.(2).fwdarw.(3) due to variation in the computer loads and the network load. In this case, a final result is the data obtained by the step (3), which is different from the result obtained by the processing in the sequence of (1).fwdarw.(5).
When one computer executes a single piece of information processing, the same result is always obtained from the same input (deterministic operation). In contrast, when a plurality of computers execute one piece of information processing, while using message passing among the computers (parallel-programs), the same result is not always obtained from the same input (nondeterministic operation), since the order of the messages received by the computers may be subject to change, as described above.
Therefore, when a plurality of computers execute one piece of information processing, even if the processing is to be replayed to debug the program, the same processing cannot necessarily be executed in the replay.
To overcome this drawback, various methods are employed to replay a program so that it will repeat an error in a debugging process. For example, a program for storing a communication history necessary for replaying the program (a function of storing a communication history when a program is first executed to replay the program later and a function of managing the order of communication received when the program is replayed) is linked to user programs, or a module for storing a communication history is incorporated in a computer.
However, a system incorporating a program which includes a step of storing a communication history also has the following drawbacks.
In the method of linking a program for storing a communication history to user programs, a function of recording a communication history and a function of managing a replay process are added before and/or after a command to transmit or receive a message in the program. Thus, according to this method, since the user's original program is partially changed and a module having a function of storing a communication history is linked to the program, the computer load, required for executing the program, is increased.
Therefore, the load caused by the additional program for storing the communication history is added to the operation load of the computer and affects normal processing. For this reason, the sequence of the recorded processing in which a communication history is recorded to replay the program does not coincide with the sequence of the normal execution in which the communication history is not recorded. In other words, an additional load is applied to the computer system in the communication history recorded by this method. Therefore, even if the program is replayed on the basis of the communication history, the message exchange in the normal execution cannot be repeated in the accurately same order.
As has been described above, according to the conventional LAN analyzer and the conventional method of analyzing a communication procedure, the same error cannot always be repeated in the replay and a program, therefore, cannot be debugged satisfactorily.