A computer system in which combinations of predetermined system boards (SB) among a plurality of system boards mounted on the computer system are managed as partitions that logically divide the system and data processing is performed for respective system boards belonging to each partition has been known (see Japanese Laid-open Patent Publication No. 2006-31199).
The configuration of such a computer system will be described more concretely. The computer system includes a plurality of data transfer circuits called crossbar units (KB) and a plurality of system boards is connected to each crossbar unit.
The computer system includes a system controller (corresponding to, for example, SCF (System Control Facility) or MMB (Management Board)) that controls communication between system boards belonging to the same partition by managing each first control unit and each second control unit included in each crossbar unit.
The first control unit of these control units each corresponds to each system board connected to the crossbar unit and performs priority control of communication between system boards by controlling communication between each system board under control among system boards connected to the crossbar unit and the crossbar unit.
The second control unit each corresponds to a different crossbar unit from the crossbar unit including the second control unit and performs priority control of communication between system boards by controlling communication between the crossbar unit including the second control unit and each of different crossbar units.
If a control unit (a first control unit or second control unit) included in a crossbar unit fails in such a computer system, stuck-at control to cause the system board corresponding to the failed control unit to be stuck (separate) from under the control of the failed control unit is performed.
A concrete example of the stuck-at control will be described. If a first control unit fails, the crossbar unit sends an error signal to the system controller.
The system controller that has received the error signal sends a stop command to temporarily stop driving of all system boards. Subsequently, the system controller sends a re-drive command to re-drive each system board excluding the system board corresponding to the failed first control unit.
In this manner, the computer system causes the system board corresponding to the failed control unit to be stuck from under the control of the failed control unit.
The above conventional technique has a problem that the availability ratio of a computer system falls when stuck-at control is performed. That is, a conventional computer system has a problem that the availability ratio of the computer system falls because driving of system boards that are not subject to the failed control unit, in other words, driving of system boards whose driving need not be stopped is also stopped when stuck-at control is performed.