1. Field of the Invention
The present invention relates to a multiprocessor system and method in which nodes, each of which includes a main memory, a data transfer unit and a plurality of processors, are connected through interconnecting networks, and more particularly to a method and apparatus for assigning job identification codes for determining the source of errors.
2. Description of the Related Art
Conventionally, there are synchronous and asynchronous operations between nodes in a multiprocessor system. In a synchronous operation, the processor which issued an instruction is prevented from switching to another task (e.g., a different user program) until the issued instruction is completed (e.g., completely processed).
In an asynchronous operation, the processor which issued the instruction is free to execute other instructions or to switch to a different processing, without waiting for completion of the issued instruction. The asynchronous operation is faster than the synchronous operation because the data transfer process between nodes generally takes several hundred to several thousand times a processor's main memory access time, during which the processor (e.g., in a synchronous operation) cannot execute other programs.
However, the synchronous operation is advantageous in that, when an error occurs, error notification is made to the processor which issued the data transfer instruction. Therefore, the user program which caused the error can be easily identified and terminated. Conversely, with the asynchronous operation, since the processor can (and does) switch to another task, if there is an error with a program, the source of the error is not detected easily.
The asynchronous operation is divided into two main types. A first type is a so-called "indirect type", and calls a system program which in turn calls a user program under the control of the system program. The second type of asynchronous operation is a so-called "direct-type", and the processor directly issues an asynchronous instruction from a user program without the intervention of the system program. The direct type is more desirable since the overhead associated with the indirect type is relatively large. Therefore, when higher-speed processing is desired, the direct type is more preferably.
Conventionally, error procedures for the direct-type asynchronous operations follow the process below.
First, the user program generates data transfer parameters such as a data transfer start address of the main memory within a given node, a data receive start address of the main memory within a destination node, a transfer data number and an end status area address of the main memory within a given node. The term "end status area address" as used herein refers to an address of the main memory which indicates normal or abnormal completion of the data transfer.
Next, the processor sends a start trigger to the data transfer unit. The start trigger is an asynchronous data transfer instruction issued from the user program. The processor sends the aforementioned data transfer parameters directly to the data transfer unit, or through the main memory to the data transfer unit.
The data transfer unit, upon receiving the data transfer instruction from the processor, performs data transfer according to the data transfer parameters received together with the instruction. When completing the data transfer, the data transfer unit writes whether the data transfer was completed normally, to the end status area of the main memory (e.g., the end status area address).
The processor executes subsequent instructions until it arrives at a data transfer end confirmation routine, which is part of the user program. Multiprogramming (e.g., time division processing) is performed during the confirmation routine, according to the system operating format, as is well known by those ordinarily skilled in the art. Thus, for brevity, such will not be described in detail herein.
When the confirmation routine is reached, the processor accesses the end status area of the main memory, and reads out the data transfer status therefrom. When the status indicates a normal termination, the processor performs the subsequent processing.
However, when there is an error, such as when the end status area does not show a normal termination, or when the end status area address is in a system area thereby not allowing its rewriting, the processor is unable to continue. Therefore, specifying with certainty which user program issued the error, is critical. However, since the processor may be executing a user program other than the user program which issued the error, specifying which user program issued the error may be impossible.
Therefore, with conventional systems performing the direct-type of asynchronous operations, the defective user program cannot be identified or terminated, which may cause the same error to occur repeatedly, thereby wasting processor time and decreasing overall system efficiency.