FIG. 1 is a block diagram illustrating an example of a parallel computer. A parallel computer 1 includes a plurality of System Boards (SBs) 12, a crossbar switch 14, and a plurality of Input Output Boards (IOBs) 15 that are connected as illustrated in FIG. 1. Each SB 12 includes a plurality of Central Processing Units (CPUs) 11, and a plurality of memories 13. Each IOB 14 includes a plurality of input parts and a plurality of output parts (or input output interfaces). The crossbar switch 14 includes an input port AI and an output port AO connected to one SB 12, an input port BI and an output port BO connected to the other SB 12, an input port CI and an output port CO connected to one IOB 15, and an input port DI and an output port DO connected to the other IOB 15.
The crossbar switch 14 includes the plurality of input ports and the plurality of output ports described above, and performs a routing (path control) in order to transfer packets as data from an arbitrary node, such as the SB 12 and the IOB 15, to another node. In order to avoid a deadlock, the crossbar switch 14 has a plurality of virtual channels for each port. In other words, the crossbar switch 14 physically includes a plurality of ports, and each port logically includes a plurality of channels (that is, virtual channels), but only one channel may be selected at one port at an arbitrary point in time.
An arbiter circuit in the crossbar switch 14, that performs an arbitration on the packets from the plurality of ports and the plurality of channels, ideally treats all ports and all channels equally. FIG. 2 is a diagram illustrating such an ideal arbiter circuit. In FIG. 2, an arbiter circuit 17 performs an arbitration process with respect to inputs of channels C0 and C1 from an input port AI, inputs of channels C0 and C1 from an input port BI, inputs of channels C0 and C1 from an input port CI, and inputs of channels C0 and C1 from an input port DI, and outputs a routing request (path control request) from one input port to an arbitrary output port based on a result of the arbitration process.
However, in actual circuit design, it is physically difficult to create the arbiter circuit 17 having the structure illustrated in FIG. 2, and it is also difficult to take into consideration the signal delay in the design. For this reason, an arbiter circuit having a 2-stage structure illustrated in FIG. 3 has been proposed.
FIG. 3 is a diagram illustrating an example of an arbiter circuit. In FIG. 3, queue arbiter circuits 18-1 select packets from queues AQ, BQ, CQ and DQ for each of the input ports AI, BI, CI and DI, and inter-port arbiter circuits 18-2 select one port from the plurality of input ports AI, BI, CI and DI. The queues AQ, BQ, CQ and DQ are retained in corresponding buffers (not illustrated) within the crossbar switch 14, and blocks identifying the queues AQ, BQ, CQ and DQ in FIG. 3 correspond to these buffers. Hence, an arbiter circuit 18 has the 2-stage structure formed by two kinds of arbiter circuits 18-1 and 18-2.
FIG. 4 is a diagram illustrating a structure of the queue arbiter circuit 18-1. As illustrated in FIG. 4, an arbitration algorithm equally arbitrates a set of queues for each of the channels C0 and C1. FIG. 4 illustrates a case in which the Least Recently Used (LRU) algorithm is used as the arbitration algorithm. Hence, a selector 181 selectively outputs one of the queues from the channels C0 and C1, depending on an operation result that is obtained by a LRU algorithm part 180 based on the LRU algorithm.
The illustration of a structure of the inter-port arbiter circuit 18-2 will be omitted, because the same arbitration algorithm as the queue arbiter circuit 18-1, such as the LRU algorithm, may be used for the arbitration with respect to a set of queues for each of the input ports AI, BI, CI and DI, in place of the set of queues for each of the channels C0 and C1. In a case in which the path control request (routing request) from the queue arbiter circuit 18-1 is not accepted for a long time and a stall state continues, due to insufficient resources and the like, a retry control is performed to once cancel the request and issue another request in order to prevent deadlock.
The arbiter circuit 18 described above may appear to perform the arbitration equally, however, when one focuses on a certain packet, a queue may not be output for a long time from the queue arbiter circuit 18-1 to the inter-port arbiter circuit 18-2, to thereby generate the so-called livelock. However, when a time-division algorithm is used for the arbitration algorithm in place of the LRU algorithm, the livelock may be prevented, but the arbitration time becomes long to deteriorate the performance of the parallel computer 1.
In the conventional arbitration method, the arbitration time becomes long when an attempt is made to prevent the livelock, and as a result, the performance of the parallel computer may deteriorate.
The applicants are aware of a Japanese Laid-Open Patent Publication No. 2001-22711.