The present invention relates generally to computer processing systems, servers and network operations. More specifically, it relates to computer network communications systems such as switches and to an automated system and method for reducing downtime of a computer server in a server farm using network switches.
Computer networks are used to interconnect many computing resources, such as computers, servers, printers, modems, and storage devices. Two or more computers may be connected together through a network where each computer is located in a different facility or even in different parts of the world. Network users may share files, run applications and access remote computers over these networks. These networks are dependent upon elements such as network communication devices used to interconnect the computing elements. The elements are identified by certain unique identifiers such as an Internet protocol (IP) address, or a hardware-specific media access control (MAC) address.
Conventionally, a network communications device used to interconnect multiple computing resources is a chassis-based system designed to accommodate a number of internal cards (often called blades). The computing resources such as host computers are coupled to the internal cards of the chassis-based system, which further couples them to one or more networks. These chassis-based systems allow for additional internal cards to accommodate network growth. Often the “blades” are complete computers configured as servers for providing computing resources to other computers on the network.
A network switch is a networking device that performs transparent bridging for connecting multiple network segments with forwarding based on MAC addresses. Conventional switches can connect at 10, 100, or 1000 megabits per second, at either half or full duplex. The use of specially designed expansion also makes it possible to have large numbers of connections utilizing different mediums of networking, including but not limited to standards such as Ethernet, Fibre Channel, ATM and 802.11. Computing service providers will often use one or more network switches to route electronic signals to a plurality of different computing devices such as host computers or servers positioned in a server farm. The electronic signals (or traffic) are in the form of packets of data, each having unique identification for addressing the packet.
The routing of digital traffic can be represented by two distinct operational areas in a switch, called the control plane and the data plane. The control plane comprises a processor and associated memory and communications devices for managing the operations of the communications device. The control plane operates to determine the location of each IP address in a network and to direct the traffic to the identified location. Likewise, a data plane operates to manage the physical operations required to actually forward the information packet to the next location. This entails storing the packet in memory while control plane information is determined, placing the appropriate physical addressing and encapsulation on the packet frame, and then forwarding the packet to the appropriate location.
The control plane incorporates the software features for data flow management on the data plane, and the storage of settings and configurations for operation of the switch. If a host moves from one port to another port, the event triggers the Dynamic Switch Policy of the currently described invention such that the switch detects the change in the server farm. A failed host is differentiated from one that is pulled out and then later put in another slot based on user configuration on the switch—i.e., whether the intent of the user is to keep the policy attached to the particular server slot, or whether the intent is to move the policy to a different slot.
Conventional network switches provide communications protocols for ensuring that network traffic in server farms is directed to the proper computing resources by identifying and tracking IP and MAC addresses as well as other information. A Dynamic Host Configuration Protocol (DHCP) lets a network administrator supervise and distribute IP addresses from a central point, and sends a new address when a computer is plugged into a different place in the network. The DHCP relay agent information option (option 82) enables a DHCP relay agent to include information about itself when forwarding client-originated DHCP packets to a DHCP server. Thus, the DHCP server is a medium for communications between elements of a network.
Network switches are governed by switch policies that configure them to perform in a predetermined manner according to the features of a given switch. Examples of features include, but are not limited to, quality of access (QoA) level, quality of service (QoS) level, an access control list (ACL), one or more virtual local area networks (VLAN), Multi-Link Trunk and other features associated with switch operations. The switch policy may also not be configured for a port, which in effect, is a null policy.
Networks must also distinguish which physical computing resources are operating in specific networks. The physical layer may be controlled by an Element Management Systems (EMS) hosted on a network element such as a server or host computer, which resides in a managed network and operates with a simple network management protocol or other TCP/IP protocol. The protocols employed are used to indicate important network events to a management layer such as EMS. Network elements may also provide other services to a network operator, such as ATM or Frame Relay virtual circuits. Network elements are not limited to servers, but include all physical devices attached to or connected to one or more differing network elements.
Conventional Virtual Local Area Networks (VLANs) offer a method of dividing one physical network into multiple broadcast domains. VLAN-enabled switches are used to forward traffic to different computing devices. As an example, if a first host needs to communicate with a second host, it first sends an address resolution protocol (ARP) frame with the second host's destination IP address and a broadcast MAC address. The switch forwards this broadcast to all other ports in VLAN. The second host will send an ARP response frame with its own MAC address as the destination MAC address that the first host should use. All subsequent traffic will include frames with the second host's address, thus enabling communication at the physical layer.
Currently, when there is a system failure or when computing resources need to be moved from one network communications physical position to another, conventional means do not provide a way to automatically change IP, MAC and other switching information. Thus network managers must run configuration software or provide other means to correctly address a computing element. This results in unacceptable downtime for the computing network. As such, what is needed is an automated system and method for configuring computer communications systems. For instance, in a server farm including switches, after the initial setup of the system as a whole, some networking related events occur in the server farm that may result in an extended period of time for the server farm to be out of service. Therefore, the stability and serviceability of the network is jeopardized. The server virtualization assistance system and method disclosed herein provides solutions to these issues.
There are four possible outcomes when moving either a physical host or a virtual machine (VM) within the system, often called a Host ID Move, wherein a host is moved from one port to another. The first possible outcome is to keep the old host network policy while migrating the switch policy from the old port to the new port. For example, in a server virtualization deployment such as VMWARE ESX, one or multiple VMs can run on top of one VMWARE ESX Server, which abstracts the physical server. The VM can dynamically move from one physical server to another physical server. Consequently, the host id of the VM is also perceived as moving from one port to another port.
In this case, host migration happens automatically in the server farm, and if the administrator wants to keep the switch policy to the host no matter where it moves, he must manually reconfigure the switch, as switch technology today does not detect the Host migration or do anything regarding policy changes automatically. This manual reconfiguration results in longer downtimes. To illustrate the downtime required, consider a host that connects to a port of a switch with a VLAN. When the automatic VM migration occurs, the VM moves to Port #3, and the host is still assigned the same IP address. To handle this properly, and administrator must pre-configure the destination port with the VLAN, otherwise, the access to the network of the host gets lost due to missing or improper VLAN configuration. The Dynamic Switch Policy embodiment of the presently disclosed invention provides a solution to the problems associated with this scenario.
A second possible outcome is to choose to adopt the new host network and switch policies on the new port. For example, in a blade server system, if an administrator prefers to have a network policy persistent to the Chassis ID and Slot ID, when a blade server moves from one slot to another slot, the host network policy—such as an IP address associated with the new Slot ID—should be applied on the moved server. The IP address will change from what it was in the original port; it will change to an address that is dependent on the host network policy associated with the new port. Note that in this scenario, a port is always associated with the same IP address no matter what physical server is connected to it. Therefore, no matter what physical server connects to the specific port, the host will be assigned with the same IP address. This occurs because today's current technology uses DHCP with or without MAC address matching and is not able to provide the ability to adopt the new host network policy; it tends to keep the old host network policy or simply get a random host network policy. The Host Network Policy Control embodiment of the currently disclosed invention is a solution to these problems.
The third possible outcome of performing a Host ID Move is to adopt the new host network policy while migrating the Switch policy from the old port to the new port. There are two specific difficulties presented by this outcome that prevent this from becoming a common solution. First, one can choose to adopt the new host network policy, as today's technology usually associates the MAC address with the host network policy; the host Network Policy Control embodiment of the current invention, described above, provides a solution to this problem. The second difficulty we find is in migrating the switch policy from the old port to the new port's network policy; the Dynamic Switch Policy embodiment of the current invention, also previously described, provides a solution to this problem.
The final possible outcome of moving a physical or virtual host within a system to be discussed herein is keeping the old Host network policy while adopting the switch policy on the new port. This is the most common scenario with the technology available today, as the switch has nothing to do.
There are two possible outcomes if a physical ID is replaced on a port, such as when a Host ID is changed on a switch port. The first possible outcome is to keep the old host network policy. Although the MAC address changes, the IP address is kept persistent. Today, when the MAC address gets changed, the host network policy also gets changed. The IP address cannot stay unchanged when MAC address changes. The invention of the current disclosure solves this problem with the Host Network Policy Control embodiment. With the second scenario, one can adopt a new host network policy. Using this approach, the MAC address has changed so the DHCP server changes the IP address accordingly.
Both physical hosts and VMs use N+1 redundancy, which also present specific problems in handling moving resources. For example, in a blade system environment, when a server blade fails by itself, the redundant server will take over the host network policy and switch policy of the failed blade. For physical servers, the first problem is that the redundant server has different MAC address as the failed one. A DHCP server usually assigns a different host network policy when a MAC address changed. The second problem is that the switch policy that used to apply to the failed server does not migrate to the port connected to the redundant server. Using the Dynamic Switch Policy embodiment in combination with the Host Network Policy Control embodiment solves the N+1 redundancy problem for physical hosts, so that the switch policy of the failed server is migrated to the redundant server.
The VM handles redundancy in a slightly different manner. For instance, if the physical host for the VM is down, the redundant physical host will launch the VM. This presents a problem for the limitations of today's technology, as the VM infrastructure will simply resume the host network policy; however, the switch policy of the guest host of the VM does not migrate automatically. For a solution to the issues surrounding VMs, only the Dynamic Switch Policy embodiment of the present invention is needed.