This invention relates to a multi-computer system in which a plurality of computers and a plurality of PCI devices are connected by a PCI switch, and more particularly, to enhancing the reliability of a multi-root PCI switch.
In recent years, blade servers which house a plurality of computers in a single machine have been put into use in order to facilitate server management. Further, improvement in processing performance of CPUs brought about by a multi-core technology, which gives a CPU a plurality of processor cores, has led to widespread use of a virtual server technology, which uses a CPU efficiently by running a plurality of virtual servers on a single computer.
While the CPU performance has improved, there has been a shortage of I/O devices which require connectors and ports for input and output. In order to remedy the shortage of I/O devices, there is known a method of enhancing the extensibility and flexibility of I/O devices with the use of a PCI switch technology.
In order to enhance the reliability of this type of server system where components are connected via a PCI switch, the system needs to be built in a manner that prevents a failure in a single device or server from affecting the entire system (in a manner that avoids single point of failure (SPOF)), which includes preparing redundant paths and setting up failover to a backup system.
For instance, there is a method in which two PCI switches are connected to each other, which normally allow the switches' respective hosts to access assigned devices, and in the event of a host failure, the connection is switched to cascade connection to allow one of the hosts to access all devices as described in, for example, US 2008/0240134 A1. With this method, however, dealing with a failure requires the re-assigning of bus numbers and similar procedures that make it difficult to execute failover while the devices are running.
Non-transparent bridging may be used in configuring a PCI switch that connects a plurality of hosts as described in, for example, an article by Jack Regula titled “Using Non-transparent Bridging in PCI Express Systems,” June 2004, pp. 24-27. A non-transparent bridge is a bridge that combines two PCI-PCI bridges to connect two PCI bus trees to each other. This bridge is recognized by each host as an end point, and when a memory is accessed via a base address register (BAR), the address is converted to forward the access to the other PCI bus tree. With this method, however, switching hosts is inevitably accompanied by resetting and re-initialization.
The PCI-SIG, which is the PCI Special Interest Group, defines multi-root I/O virtualization (MR-IOV) standards, which extend a PCI switch used by a conventional single server such that a plurality of computers are connected to a plurality of peripheral component interconnect (PCI) devices (or PCIe(xpress) devices) which are I/O devices as described in, for example, an online document “Multi-Root I/O Virtualization and Sharing Specification Rev1.0” published by PCI-SIG in May 2008, pp. 109-222.
A device compliant with the MR-IOV standards (MR device) has a plurality of virtual hierarchy (VH) layers each of which is allocated to a virtual switch configured from a switch compliant with the MR-IOV standards (MR switch). A single MR device is thus shared by a plurality of server hosts and used concurrently by the server hosts.
According to the MR-IOV standards, management software called a PCI manager is used to manage the configuration information of MR switches and MR devices. The PCI manager itself uses a management virtual switch called VH0 (management virtual switch VH0) for settings in the MR switches and the MR devices.
Every MR switch and every MR device that is managed by the PCI manager is connected to the management virtual switch VH0, and the virtual hierarchy layer VH0 in the MR device holds a special function for management (base function: BF). A host that includes the management virtual switch VH0 (hereinafter, may also be referred to as manager host) can therefore present a single point of failure.
When a failure occurs in the manager host, the BF of an MR device that is managed by the manager host is reset. Resetting the BF deletes configuration information in which the MR device is partitioned on a VH basis and accordingly affects all hosts that have been sharing and using this MR device. Also in the case where the shutting down of the manager host is scheduled for planned maintenance or firmware update and the configuration information is migrated to another host, the manager host cannot be shut down without affecting hosts that are using an MR device managed by the manager host.