In the field of science and technology including the atomic power, meteorology and aeronautics, a parallel processor system for arithmetically processing a vast quantity of data far exceeding the data processing capacity of a general-purpose mainframe computer is required. The parallel processor system is generally called the supercomputer, in which an ultrahigh speed arithmetic operation is realized by parallel processing of a plurality of processor elements interconnected through an inter-processor network (such as a crossbar network unit). The parallel processor system requires a specification capable of exhibiting at least a predetermined level of performance even in the state of high utilization rate of a CPU (Central Processing Unit), i.e. under a heavy load. Therefore, a load testing apparatus for checking the performance under heavy load is indispensable for designing, development and performance evaluation of the parallel processor system. Also, the parallel processor system is required to have means and a method of identifying a defective point rapidly in case of a fault.
FIG. 32A is a block diagram showing a configuration of the conventional parallel processor system described above. A crossbar network unit 1 and five processor elements PE0 to PE4 making up the parallel processor system are shown in FIG. 32A. The processor elements PE0 to PE4 are arithmetic elements for executing the parallel computation in accordance with a parallel algorithm, and each include a transmission unit and a receiving unit (not shown) for transmitting and receiving packets (data), respectively. The crossbar network unit 1 is for interconnecting the processor elements PE0 to PE4 and includes a group of N×N (5×5 in the shown case) crossbar switches (not shown). The incoming line side of the crossbar network unit 1 is connected to the transmission unit (not shown) of the processor elements PE0 to PE4, respectively, and the outgoing line side thereof is connected to the receiving unit (not shown) of the processor elements PE0 to PE4, respectively.
For the parallel processor system described above, a load test is conducted for checking the performance under load. In the load test, packets are transmitted from a predetermined. processor element of a source to a processor element of a destination and thereby a pseudo-load is generated, and the performance is evaluated based on the comparison between the packet transmission time (measurement) and an expected value. theoretically determined.
Specifically, first, a plurality of sets (pairs) of the processor elements. PE0 to PE4 are determined by being extracted at random as shown in FIG. 32A. In the example shown in FIG. 32A and FIG. 32B, the following sets 1A to 5A are determined.
SourceDestination(1A)Processor element PE0 andprocessor element PE1(2A)Processor element PE1 andprocessor element PE0(3A)Processor element PE2 andprocessor element PE3(4A)Processor element PE3 andprocessor element PE2(5A)Processor element PE4 andprocessor element PE4
The next step in the load test is to transmit packets from the processor elements PE0 to PE4 of the source in 1A to 5A above to the corresponding processor elements PE1 to PE4, respectively, of the destination at a time. As a result, the packets are exchanged by the crossbar network unit 1, and received by the processor elements PE1 to PE4 of the destination. In the process, the packet transmission time between each set of the processor elements is measured. In the case under consideration, a total of five measurements (transmission time) corresponding to 1A to 5A are obtained. These transmission time are compared with an expected value theoretically determined, and the performance of the parallel processor system is evaluated based on whether the difference between the transmission time and the expected value is in a tolerable range.
The expected value is a theoretical value of the transmission time which is expected to take for the packets to be transmitted between the processor elements in actual arithmetic operation. This expected value is a constant value of the theoretical transmission time plus a margin. The theoretical transmission time is the one between the processor elements which enables the parallel processor system to exhibit the maximum performance, and is calculated by a technique such as a simulation. The margin, on the other hand, is a value for absorbing the difference in transmission time caused by the difference of the physical distance between different sets of the processor elements described above.
The load test of the parallel processor system is desirably conducted under as heavy a condition as possible in order to assure proper evaluation of the performance under severe operating conditions. In the prior art, however, the processor elements PE0 to PE4 of the sources and destinations are combined at random as shown in FIG. 32A, and therefore, it is sometimes impossible to conduct the load test under heavy condition as shown in FIG. 32B, thereby leading to the disadvantage that the reliability of the test result is low.
Specifically, in the case shown in FIG. 32A, the processor elements of the source and the processor elements of the destination are combined in one-to-one relation, and packets are sent at the same time from all the source processor elements. Thus, the load test under heavy load can be conducted.
In the sets shown in FIG. 32B, on the other hand, a receiving interference is caused in the processor element PE3, and therefore the load is reduced. Specifically, FIG. 32B illustrates a combination for packet transmission in which two processor elements PE2 and PE4 of the source send packets to one processor element PE3 of the destination. In this combination, the two packets, which are sent from the processor elements PE2 and PE4 of the source, arrive at the single processor element PE3 through the crossbar network unit 1. In the process, the processor element PE3 of the destination which can receive only one packet at a time develops a receiving interference in which the two packets compete with each other.
Actually, however, the chance of the two packets arriving at the processor element PE3 at the same time point is very slim due to the difference in transmission time. As a result, while the first arriving one of the two packets is received by the processor element PE3, the other packet stands by. The combination causing this receiving interference, as compared with the sets shown in FIG. 32A, reduces the load and therefore a reliable test result cannot be obtained.
Also, in the conventional load test, an expected value (theoretical value) including a margin is applied uniformly to all the transmission time (measurements) between a plurality of sets of the processor elements, as described above. Actually, however, due to the difference in physical distance described above, the transmission time (measurement) is varied from one processor element set to another. In view of the fact that a predetermined expected value is used for varied transmission time, the conventional load test may produce a test result different from the reality, and therefore has the disadvantage of low reliability.
On the other hand, the conventional parallel processor system requires identification of a defective point based on the phenomenon presented at the time of a fault in which a packet is not sent from a processor element of the source or a packet sent from a processor element of the source fails to be received by a corresponding processor element of the destination. In the conventional parallel processor system, the configuration is complicated with the increase in the number of the processor elements involved, and the number of points to be checked increases to such an extent that a vast amount of labor and time are required before successfully identifying a defective point. Especially in the case of a fault of the crossbar network unit 1, a vast number of crossbar switches are required to be checked one by one and the workload required makes the identification of a defective point very difficult.
Further, in the case where a fault occurs in a processor element of the source, the address of a packet may change and therefore the particular packet may be sent erroneously to an entirely different destination. In such a case, the destination processor element which should otherwise receive the particular packet cannot receive it, and therefore detects a fault as a time out for receiving. On the other hand, the destination processor element that has received the packet erroneously sent thereto also detects a fault. In contrast, the processor element of the source that has actually developed a fault is regarded to be in normal operation since it has sent out the packet any way. In case of the secondary fault described above, it is more difficult to identify a defective point.