1. Field
The invention disclosed and claimed herein generally pertains to a network virtualized environment, wherein a Virtual Input/Output Server (VIO Server or VIOS) has a shared ethernet adapter (SEA) for connecting client Logical Partitions (LPARs) to physical network resources. More particularly, the invention pertains to an environment of the above type having two VIO Servers, in order to provide a primary SEA and a backup SEA. Even more particularly, the invention pertains to an improved or enhanced failover mechanism, to selectively exchange the primary and backup roles or states of the two SEAs, as required.
2. Description of the Related Art
As is known by those of skill in the art, VIOS is a special purpose virtual machine that can virtualize I/O resources to other virtual machines, such as client LPARs, in a network virtualized environment comprising a central electronics complex (CEC) or other computer system environment. VIOS works by owning physical resources, e.g. storage and network resources, and mapping respective physical resources to virtual resources. Client LPARs connect to physical resources via these mappings.
In a useful arrangement or configuration, a client LPAR is connected to an internal virtual ethernet, and the SEA of a VIOS, comprising a module on the VIOS, is used to establish a bridge between the internal virtual ethernet, and an external physical ethernet network. The client LPAR thus has access to network IO resources, delivered via the VIOS. However, this arrangement could be quite undesirable, if the VIOS represented a potential single point of failure for that client LPAR. To avoid this single point of failure, the VIOS have typically been configured in pairs, along with a failover method, so that if one VIOS goes down, the other VIOS takes over. Thus, the client partition is not impacted.
In a common prior art failover arrangement, two VIO Servers are provided, wherein each one has a SEA. The SEA of one of the VIO Servers is initially selected to be the primary SEA, which is responsible for establishing a bridge as described above, to connect client LPARs to physical resources. The SEA of the other VIO Server becomes the backup SEA, and remains passive while in that role. Each of the SEAs is configured with a trunk virtual ethernet adapter and a corresponding trunk priority, and the SEA with the higher trunk priority becomes the primary SEA.
The information about liveliness of each SEA, and changes in its trunk priority, is exchanged between the two SEAs by extending a control channel between them. The control channel is a virtual ethernet on a separate virtual local area network (VLAN) for exchanging keep alive (KA) messages and other state information between the primary and backup SEAs. More particularly, the primary SEA sends KA packets to the backup SEA at prespecified intervals, such as every 300 msecs, wherein the KA packets contain the priority of the primary SEA.
Upon reception of each KA, the backup SEA checks whether the priority of the primary SEA is higher than its own priority, and if so it just keeps listening to KAs. However, if the backup SEA trunk priority is found to be higher, then the backup SEA kicks off a state change, and sends a RECOVERY packet to the primary SEA to indicate the priority of the backup SEA. Upon receiving the RECOVERY packet, the primary SEA will validate that the backup SEA priority is indeed higher. The primary SEA accepts the RECOVERY packet by sending a NOTIFY packet to the backup SEA. Once the NOTIFY packet is received, the backup SEA takes over as primary SEA, and starts sending KAs to the previous primary SEA, which goes to backup state and starts listening to KAs.
In the above arrangement, if the primary SEA goes down, the backup SEA waits for the time period of a specified number of successive KAs, such as 3KAs or 900 msec. If no KA is received by the backup SEA during this period, the backup SEA then takes over as primary SEA, and becomes responsible for subsequent bridging tasks.
The above prior art arrangement has some significant drawbacks. For example, this design has complete dependence on the control channel. Any issues that occur with the control channel will cause SEA failover to behave in an unpredictable manner, and may likely result in network outages. There are instances where the primary SEA is functioning fine, but due to problems on the control channel (such as packet drops or communication failures), the backup SEA has not received KAs. The backup SEA therefore assumes the primary SEA is dead, and takes over as primary SEA. This results in both of the SEAs bridging simultaneously, which can result in network loops unless switches have implemented Spanning Tree Protocol (STP). These network loops can bring an entire network to a standstill.
Further drawbacks include the complexity of failover configuration using control channels, which are faced by many customers and other users. Also, the requirement of using a VLAN as a control channel prevents LPARs from using this VLAN for any communication.
The above prior art arrangement, and additional drawbacks and disadvantages thereof, is described hereinafter in further detail, in connection with FIG. 2.