In multi-processor systems functioning as an information processing apparatus (e.g. server system), in which a plurality of processors functioning as central processing units (CPUs) each have a memory space in common, it may be desirable to maintain cache consistency (i.e., consistency of the content of memory stored in cache memory). That is, the content of memory stored in each area of the memory space may be desirable to be the same at every moment when the area of the memory space is accessed from any of the CPUs. Each of the CPUs caches and stores the content of memory when necessary, and thus, in order to guarantee the cache consistency, data transfer may be desirable to be mutually performed among all the CPUs. Further, prior to commencement of the data transfer, a request for the data transfer, which is performed on a command packet basis, is transmitted to all the CPUs by means of a broadcast transfer. Furthermore, in order to guarantee the order of arrivals of the command packets, which have been broadcast transferred in such a manner as described above, it may be desirable for a packet command to be simultaneously arrive at all of transfer destinations, i.e., all of target nodes. Further, crossbar apparatuses, each functioning as a data transfer apparatus which has a function of relaying data transfers between CPUs, are desired to achieve high efficient data transfer.
FIG. 1 is a block diagram illustrating an example of a configuration of a typical multi-processor system. In this example, this multi-processor system is configured to include a plurality of system boards (SBs) 1-00 to 1-15 (SB 00 to SB 15) and a plurality of crossbar (XB) apparatuses 2-00, 2-10, 2-20 and 2-30 (XB 00, XB 10, XB 20 and XB 30), which relay data transfers between any two system boards out of the plurality of system boards 1-00 to 1-15. Each of the system boards 1-00 to 1-15 is configured to include a CPU, memory chips and a system controller (SC), but, such a configuration itself is well known to those skilled in the art, and thus, is omitted from illustration in FIG. 1.
In this example, the system boards 1-00 to 1-07 and the crossbar apparatuses 2-00 and 2-10 are installed inside the same enclosure 3-0. Further, the system boards 1-08 to 1-15 and the crossbar apparatuses 2-20 and 2-30 are installed inside the same enclosure 3-1. Each of the crossbar apparatuses 2-00 and 2-10 installed inside the enclosure 3-0 is connected to the crossbar apparatuses 2-20 and 2-30 installed inside the enclosure 3-1 via a connection unit 4, such as a cable assembly.
FIG. 2 is a block diagram illustrating an example of a configuration of an existing crossbar apparatus. In FIG. 2, for convenience of explanation, only the configuration of the crossbar apparatus 2-00 is illustrated, but, the configuration of each of the crossbar apparatuses 2-10, 2-20 and 2-30 illustrated in FIG. 1 may be the same as or similar to the configuration of the crossbar apparatus 2-00. The crossbar apparatus 2-00 is configured to include a buffer unit 21, output packet selection units 22 and 27, time difference adjustment units 23 and 25, and a synchronized distribution unit 26, which are mutually connected as illustrated n in FIG. 2.
The buffer unit 21 is configured to include four buffers which are caused to correspond to the system boards 1-00 to 1-03 to which the crossbar apparatus 2-00 is connected, and hold broadcast (BC) commands from the system boards 1-00 to 1-03.
The output packet selection units 22 is configured to transfer a BC command held in the buffer unit 21 to crossbar apparatuses to each of which the BC command may be transferred, on the basis of partition configuration determination information provided by an operation management unit 11, that is, firmware executed by the CPU of the operation management unit 11, from among the crossbar apparatus 2-10 inside the same enclosure 3-0 and the crossbar apparatuses 2-20 and 2-30 inside the different enclosure 3-1. The operation management unit 11, e.g., the firmware executed by the CPU of the operation management unit 11, is configured to determine the configurations of individual partitions on the basis of information relating to apparatuses constituting the server system, and output partition configuration determination information, as well as register setting information in accordance with the partition configuration determination information. In this example, the crossbar apparatus 2-00 is configured to identify pieces of partition configuration information, i.e., partition IDs, which correspond to the sixteen system boards 1-00 to 1-15, respectively. The crossbar apparatus 2-00 is configured to cause the output packet selection unit 22 to hold the pieces of partition configuration determination information corresponding to the partition IDs, which are set by the operation management unit 11, and transfer the BC command to crossbar apparatuses, each being connected to at least a system board having a partition ID equal to one of the partition ID of the system boards 1-00 to 1-03 connected to the crossbar apparatus 2-00 itself. As described below, the crossbar apparatus 2-00 is configured to determine a piece of partition configuration information corresponding to an SB, which is a BC-command transmitter, and transfer the BC command to the crossbar apparatus 2-10 if the piece of partition configuration determination information indicates a partition P2, and transfer the BC command to the crossbar apparatuses 2-10, 2-20 and 2-30 if the piece of partition configuration determination information indicates a partition P3.
The time difference adjustment unit 23 is configured to include a selector 230 and a buffer 231 therein, and BC commands held by the buffer unit 21 and register setting information from the operation management unit 11, e.g., the firmware executed by the CPU of the operation management unit 11, are inputted to the buffer 231 and the selector 230, respectively. The time difference adjustment unit 23 is configured to have four time difference adjustment units which are caused to correspond to the system boards 1-00 to 1-03, respectively. The time difference adjustment unit 23 is configured to receive a BC command from the buffer unit 21. Moreover, in order to cause the BC command to simultaneously arrive at all of target nodes, that is, all of target system boards, the time difference adjustment unit 23 is also configured to output the BC command to the synchronized distribution unit 26 after delaying the broadcast transfer of the BC command by an amount equal to a predetermined delay time by switching the selector 230 in accordance with the register setting information from the operation management unit 11, which will be described below. In the case where no connection between crossbar apparatuses inside a single enclosure exists, the buffer 231 of the time difference adjustment unit 23 is caused to be bypassed by switching the selector 230 in accordance with the register setting information from the operation management unit 11. Further, in the case where the delay time is adjusted so as to be equal to a transfer delay between the crossbar apparatuses 2-00 and 2-10, the delay time is set to it 1τ (“τ” means a period of one cycle), and in the case where the delay time is adjusted so as to be equal to a transfer delay between the crossbar apparatuses 2-00 and 2-20 or between the crossbar apparatuses 2-00 and 2-30, the delay time is set to 2τ. In the case where the buffer 231 of the time difference adjustment unit 23 is configured by using a ring buffer, in the former case, the pointer of the ring buffer is incremented at intervals of 1τ, and in the latter case, the pointer of the ring buffer is incremented at intervals of 2τ.
The buffer unit 21, the output packet selection unit 22 and the time difference adjustment unit 23 constitute a local broadcast control (LBC) unit 28.
A global broadcast control (GBC) unit 29 is configured to output BC commands received from the LBC unit 28 and the crossbar apparatuses 2-10, 2-20 and 2-30 to target system boards. The GBC control unit 29 is constituted by the time difference adjustment unit 25, the synchronized distribution unit 26 and the output packet selection unit 27.
The time difference adjustment unit 25 is configured to include a selector 250 and a buffer 251, and BC commands transferred from the crossbar apparatuses 2-10, 2-20 and 2-30, and register setting information from the operation management unit 11 are inputted to the selector 250. The time difference adjustment unit 25 is configured to output the BC command from the crossbar apparatus 2-10 to the synchronized distribution unit 26 after causing the BC command to be transferred via the buffer 251 by switching the selector 250 in accordance with the register setting information, in order to cause a BC command to simultaneously arrive at all of target system boards. The time difference adjustment unit 25 is further configured to output the BC command from the crossbar apparatus 2-20 or the crossbar apparatus 2-30 to the synchronized distribution unit 26. Moreover, thereby, the time difference adjustment unit 25 is configured to perform adjustment so as to make amounts of transfer time resulting from causing the BC commands to be transferred via paths causing various transfer rates to be equal to one another. Moreover, in the case of a model M1 in FIG. 1, in which no connection between crossbar apparatuses exists, and further, in the case of a model M2 in FIG. 1, in which the crossbar apparatuses 2-20 and 2-30 do not exist, the buffer 251 of the time difference adjustment unit 25 is caused to be bypassed by switching the selector 250 in accordance with the register setting information from the operation management unit 11. In the case of a model 3 in FIG. 1, one or more connections between any two crossbar apparatuses out of the crossbar apparatuses 2-00, 2-10, 2-20 and 2-30 exist.
The synchronized distribution unit 26 is configured to receive a BC command transmitted from the LBC unit 28 included in either of the crossbar apparatuses 2-00, 2-10, 2-20 or 2-30, and distribute the BC command to respective target system boards in synchronization with one another within each partition. The synchronized distribution unit 26 is configured to, include four synchronized distribution units which are caused to correspond to the system boards 1-00 to 1-03, respectively, in order to distribute the BC command to respective system boards 1-00 to 1-03 in synchronization with one another.
The BC commands outputted from the synchronized distribution unit 26 are selected by the output packet selection unit 27, and the outputted BC commands are inputted to the corresponding system boards 1-00 to 1-03. The output packet selection unit 27 is configured to include four output packet selection units which are caused to correspond to the system boards 1-00 to 1-03, respectively.
In addition, commands which are processed by the crossbar apparatuses are not only the BC commands. Peer-to-peer (PP) packets may be also caused to transfer through the same crossbar apparatuses. The output packet selection unit 27 has a function of selecting packets, which are to be outputted therefrom, from among the BC command packets and other kinds of packets, such as a peer-to-peer packet.
As illustrated n FIGS. 1 and 2, the crossbar apparatuses 2-00 and 2-10, and the crossbar apparatuses 2-20 and 2-30 are connected to each other inside the same enclosure, respectively, that is, each of these pairs of crossbar apparatuses is in the condition of a connection inside the same enclosure. In contrast, the crossbar apparatuses 2-00 and 2-20, the crossbar apparatuses 2-00 and 2-30, the crossbar apparatuses 2-10 and 2-20, and the crossbar apparatuses 2-10 and 2-30 are connected to each other via the connection unit 4, respectively. The connection unit 4 is provided between the different enclosures 3-0 and 3-1, that is, each of these pairs of crossbar apparatuses is in the condition of a connection between different enclosures. Therefore, a transfer rate of each of buses used for the connections between different enclosures is lower than the transfer rate of each of buses used for the connections inside the same enclosure. That is, for example, with respect to three interfaces xb1, xb2 and xb3 illustrated in FIG. 3, which are provided by the crossbar apparatus 2-00, the transfer rate of the interface xb1 may be set to a higher transfer rate, but each of the transfer rates of the interfaces xb2 and xb3 may be merely set to a lower transfer rate. Further, a transfer rate which may be realized in the case where one or more connections between crossbar apparatuses inside the same enclosure exist is lower than the transfer rate which may be realized in the case where no connection between crossbar apparatuses inside the same enclosure exists.
As described above, in such a server system as illustrated in FIG. 1, a transmission performance in the case of the configuration of the connections between different enclosures is lower than the transmission performance in the case of the configuration of the connection inside the same enclosure. In this example, a transfer rate of each of buses used for the connections between different enclosures is set to half of the transfer rate of a bus used for the connection inside the same enclosure. Therefore, in order to cause a BC command to simultaneously arrive at all of target nodes, the broadcast transfer rate in the case of the configuration of the connection inside the same enclosure is necessary to be set to a lower transfer rate the same as the transfer rate of the broadcast transfer rate in the case of the configuration of the connections between different enclosures.
However, in the case where a plurality of partitions is set so as to be closed within an enclosure of a server system, although there are connections between different enclosures of the server system, in each of which no data transfer is performed via the connections between different enclosures are likely to exist. For example, in such a partition configuration as illustrated in FIG. 3, among partitions P1, P2 and P3, which are indicated by a chain double-dashed line, a dotted line and a chain single-dashed line, respectively, each of the partitions P1 and P2 is not allowed to transfer BC commands across enclosures. However, in existing methods, regardless of the partition configuration including the partitions P1, P2 and P3, each interface between crossbar apparatuses is set to a lower transfer rate. As described above, compared with transfers performed within a single enclosure, in the case where at least a partition covering a plurality of enclosures is likely to exist, setting of a transfer rate thereof is performed taking into account connections between different enclosures. As a result, the broadcast transfer rate is reduced to half the broadcast transfer rate of the case where the broadcast transfer is performed within the single enclosure of a server system.
In order to perform setting change of the broadcast transfer rate while the server system is being operated, it is necessary to clear packets once, which are being processed in each of apparatuses included in the server system, cause the server system to be in a condition where no process is executed, that is, in a suspend condition, and then, perform setting change of the broadcast transfer rate. Therefore, such processing requires complicated control. For this reason, to date, the transfer rate of broadcast transfers performed across different enclosures has been set to a fixed rate.    [Patent Document 1] Japanese Laid-open Patent Publication No. 2000-259542    [Patent Document 2] Japanese Laid-open Patent Publication No. 06-314255