The invention relates to redundant access to I/O resources, which contain I/O adapters and associated support functions that provide connection to external I/O attachments (e.g. DASD, tape, LAN switches) and, in particular to a method, system and storage medium for providing a concurrent I/O hardware infrastructure that includes redundant I/O access to and from the I/O resources.
Computer or server systems may be built from common building blocks (called nodes or books) that are interconnected via a high speed bus or buses and have the capability to be configured as a single computer system. Each node contains processors, memory, I/O hub cards and an interconnection fabric to the I/O hardware subsystem as well as to the other nodes. A single node with I/O attachments (e.g., storage devices and network devices) connected via I/O resources (e.g., adapters and virtualization engines) through the I/O hubs, can be operated as a stand-alone computer. Additional nodes, for more computing power, can be added to the computer system as required by workload without buying a separate server. These nodes, collectively, comprise a multiple node mainframe and, in general, are configured as a large single system image. When configured in this manner, each node may access I/O attachments via the I/O resources attached to any of the nodes even though the accessing node has no direct connection to these resources. This capability is provided by exploiting the normal node to node communication path that is necessary for memory operations in this configuration.
Computer and/or server systems of this nature may also have a requirement for high availability and concurrent maintenance. When a node fails or maintenance operations impact a node for either upgrade (i.e. plugging additional memory modules) or for a repair (i.e. replacing a defective part), this may result in other nodes losing access to the I/O resources attached to the impacted node unless a redundant path to those resources is provided.
Another advantage of the redundant path is to allow continued access to the I/O resources when a failure occurs in the path that attaches the I/O resources. The server may be designed such that transparent recovery occurs without human intervention or impact to I/O operations in progress.
At least one current server design (e.g., z990 from IBM) that may be utilized to implement concurrent upgrade, repair, and/or recovery of a node in a multiple node machine requires that the I/O resources directly attached to the affected node be no longer usable by the other nodes during the service action. This is because the connection is broken to the I/O resources when the node or intervening path is not operational.
One way of getting around this is to place a switch fabric between the processor nodes and the I/O resources to allow any node to connect to any I/O resource. Since a single switch fabric would be a single point of failure, a second switch fabric would be necessary to provide a redundant path. This solution is expensive because it requires physical resources (power, space, etc.) to support the additional hardware, management firmware, and an additional interface layer between the processor and I/O port. The additional switch hardware and firmware between the processor node and the I/O port may adversely affect I/O performance.
It would be desirable to be able to have a cost effective and simplified manner of implementing concurrent upgrade and repair of a node in a multiple node machine such that the I/O resources directly attached to the affected node are usable by the other nodes, during the upgrade, recovery, or repair activity.