1. Field of the Invention
Embodiments of the present invention relate, in general, to software virtualization and, more particularly, to virtualization and load balancing of server application instances in a cluster environment.
2. Relevant Background
Load balancing is a computer networking methodology to distribute an individual computer's workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Load balancing is usually provided by dedicated software or hardware, such as a multilayer switch or a domain name system server. Examples of the software solutions include the Apache web server's mod_proxy_balancer extension, Varnish, or the Pound reverse proxy and load balancer.
An important issue when operating a load-balanced service is how to handle information that must be kept across the multiple requests in a user's session. If this information is stored locally on one backend server, then subsequent requests going to different backend servers would not be able to find it. This might be cached information that can be recomputed, in which case load-balancing a request to a different backend server introduces a performance issue, but if the information is lost, the system can crash.
One solution to the session data issue is to send all requests in a user session consistently to the same backend server. This is known as persistence or stickiness. A significant downside to this technique is its lack of automatic failover: if a backend server goes down, its per-session information becomes inaccessible, and any sessions depending on it are lost.
Recall that a server is a physical computer dedicated to run one or more services to serve the needs of the users of other computers on a network. Said differently, any computerized process that shares a resource to one or more client processes is, for all intents and purposes, a server. For example, the mechanism that shares files to clients by the operating systems is a file server. Thus, depending on the computing service that the server offers, the server could be a database server, a file server, a mail server, a print server, a web server, a game server or some other kind of server based on the services it provides. In the hardware sense, the word server typically designates computer models intended for hosting software applications under heavy demand in a network environment. And in some cases, the hardware for these specific functions is specialized. That is, a server computer possesses different capabilities than that of a general purposes personal computer.
A server cluster, as referred to herein, is a group of at least two independent servers connected by a network and managed as a single system to provide high availability of services for clients. FIG. 1 is a high level depiction of a server cluster environment as would be known to one of reasonable skill in the relevant art. In the illustrated depiction, four servers comprise a server cluster 100. In this case, server A 110, server B 120, server C 130, and server D 140 are linked both directly and via a load balancer/router 150. The router 150 further serves as access to the Internet 170 and the firewall 160.
Server clusters are designed so that the servers in the cluster work together to protect data, keep applications and services running after the failure of one or more servers in the cluster, and maintain consistency of the cluster configuration. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server. So, when a failure occurs on one computer in a cluster, resources are redirected and the workload redistributed to another computer in the cluster.
FIG. 2 is a further illustration of the server cluster shown in FIG. 1 in which server C 130 has failed. As would be known by one of reasonable skill in the relevant art, the removal of the server in a server cluster utilizes existing failover technology to terminate and restart the applications associated with server C 130 on another server within the server cluster. However, by doing so the applications associated with server C must be re-instantiated with a new host and new Internet protocol address.
Other benefits of server clusters include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications, and the like.
Server clusters come in all shapes and sizes but they are generally either asymmetric clusters or symmetric clusters. In an asymmetric cluster, a standby server exits only to take over for another server in the event of failure. This type of cluster provides high availability and reliability of services but does so at the cost of having redundant and unused capability. The standby server performs no useful work and is either as capable or less capable than the primary server. In a symmetric server cluster, every server in the cluster performs some useful work and each server in the cluster is the primary host for a particular set of applications. If a server fails, the remaining servers continued to process the assigned set of applications as well as pick up new applications from the failed server. Symmetric server clusters are more cost effective but, in the event of a failure, the additional load on the working servers can make them fail as well.
On each server exists one or more instantiations of various applications. Underlying these applications is a database engine such as Microsoft Transacted Structured Query Language or T-SQL. T-SQL (referred to herein as SQL) is a special purpose programming language designed for managing data in relational database management systems. Originally built on relational algebra and tuple relational calculus, its scope includes data insert, query, update and delete functionality, schema creation and modification, and data access control. Other relational alternatives to SQL include .QL, 4D Query Language, Datalog, URL based query method, IBM Business Systems 12, ISBL, JPQL, Object Query Language, UnQL, QBE and the like.
SQL is a popular database engine that servers use as a building block for many larger custom applications. Each application built using SQL Server (or the like) typically communicates with a single instance of the database engine using that servers name and Internet Protocol address. Thus, servers with many applications depending on SQL server to access a database must normally run an equal number of instances of SQL server. In most cases, each instance of SQL server runs on a single node within the server cluster, each with its own name and address. If the node (server) fails, the databases are unavailable until the system is restored on a new node with a new address and name. Moreover, if the node becomes heavily loaded by one or more applications, the performance of the database and other applications can be degraded.
Generally, there are three types of server cluster failures. The first is application or service failure. These failures occur when application software running on a server fails to perform properly. Second is a system or hardware failure. As implied, this type of failure is tied to the hardware components of one or more servers. For example, the failure of one or more CPUs, drives, memory or power supply. Lastly, a site failure can occur when, due to a natural event such as a storm or a power failure, an entire site fails to perform as expected. The ability to handle each of these types of failures is critical to server cluster's reliability.
Thus, the failover of an application from one server (i.e., machine) to another in the cluster may be automatic in response to a software or hardware failure on the first machine, or alternatively, may be manually initiated by an administrator. However, unless an application is “cluster-aware” (i.e., designed with the knowledge that it may be run in a clustering environment), problems arise during failover.
One problem with existing virtual applications that are not cluster-aware, i.e., legacy applications such as SQL server, is that such applications assume that the current machine name is the only computer name. Consequently, if the application exposes the machine name to clients, or writes the machine name into its persistent configuration information, the system will not function correctly when the application fails over and runs on a different machine having a different machine name. By way of example, an electronic mail application program provides its machine name to other machines connected thereto in a network. If the application is running in a cluster and the server is failed over to another machine, this other machine's name will not be the name that was provided to the other network machines, and the electronic mail application will not function correctly.
To address this deficiency, traditional virtualization platforms for applications in the prior art use failover clustering technology. For example, Microsoft Windows® uses Microsoft Failover Clustering (“MSCS”)®. MSCS, and products like MSCS, allows one or more computers to join together to form a cluster. An application then can be made to listen and provide data to clients via a cluster host name or Internet Protocol (“IP”) address and not individual computer names. If an active node (computer) fails, MSCS would reposition the application over to the next available node in the cluster to maintain functionality of the application. To avoid data corruption and to ensure only one node in the entire cluster can have access to the file system, New Technology File System (“NTFS”), Small Computer System Interface (“SCSI”) reservation is employed. What is lacking, however, is the ability to virtualize an application, such as SQL server, in a cluster environment without utilizing MSCS and SCSI reservation.
One of reasonable skill in the relevant art will recognize that virtualization, broadly defined, is the simulation of the software and/or hardware upon which other software runs. This simulated environment is often called a virtual machine (VM). A virtual machine is thus a simulation of a machine (abstract or real) that is usually different from the target (real) machine (where it is being simulated on). Virtual machines may be based on specifications of a hypothetical computer or emulate the computer architecture and functions of a real world computer. There are many forms of virtualization, distinguished primarily by the computing architecture layer, and virtualized components, which may include hardware platforms, operating systems (OS), storage devices, network devices or other resources.
Application or process virtualization can be viewed as part of an overall trend in enterprise IT that includes autonomic computing. Autonomic computing is a scenario in which the IT environment is able to manage itself based on perceived activity, and utility computing, in which computer processing power is seen as a utility that clients can pay for only as needed. The usual goal of virtualization is to centralize administrative tasks while improving scalability and overall hardware-resource utilization. This type of parallelism tends to reduce overhead costs and differs from multitasking, which involves running several programs on the same OS (component).
Hardware virtualization or platform virtualization refers to the creation of a virtual machine that acts like a real computer with an operating system. Software executed on these virtual machines is separated from the underlying hardware resources. For example, a computer that is running Microsoft Windows may host a virtual machine that looks like a computer with the Ubuntu Linux operating system; Ubuntu-based software can be run on the virtual machine.
In hardware virtualization, the host machine is the actual machine on which the virtualization takes place, and the guest machine is the virtual machine. The words host and guest are used to distinguish the software that runs on the physical machine from the software that runs on the virtual machine. The software or firmware that creates a virtual machine on the host hardware is sometimes called a hypervisor.
A significant limitation to a server cluster environment is the inability to manage individual applications during a failover. One approach to address this deficiency in the prior art is to virtualize an application such as SQL server in a cluster environment without the required use of MSCS or the like and to establish a shared cluster system in which every node in a cluster possesses continuous full read and write access and thus eliminate the need for MSCS or SCSI reservations. Such an approach is described in commonly assigned U.S. patent application Ser. No. 13/743,007 filed 16 Jan. 2013 entitled “Systems and Methods for Server Cluster Application Virtualization.”
In most server clusters, components are monitored continually (e.g., web servers may be monitored by fetching known pages), and when one becomes non-responsive, a load balancer is informed and no longer sends traffic to that particular server. And, conversely, when a component comes back on line, the load balancer begins to route traffic to it again. But for this approach to work, there must be at least one physical component in excess of the service's capacity.
Recall that failover is the continuation of a service after the failure of one or more components (servers or applications) in server cluster. Recall that a server cluster is a group of at least two independent servers connected by a network and managed as a single system. One important benefit of a server cluster is that cluster software run on each of the servers in a cluster and can detect application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server according to a failover plan.
Traditionally, failover clustering solutions are used to monitor an application and failover the application from one system to another system when the prior system is nonfunctional or crashed. Both failover and load balancing are critical to the efficient and effective operation of a server in a server cluster. Currently however, the systems of the prior art provide either application failover OR load balancing but not both. And these systems are designed for individual components not for the entire cluster.
Many database administration product companies have applications that are tightly coupled to the server they reside on, whether physical or virtual. As a result, and because of the simplicity of a SQL Server installation, server sprawl ensues. Due to this tight coupling, applications are difficult to move or migrate as changes in workload (load balancing) occur. Moving applications between servers in on-premises or in cloud environments can be time-consuming and expensive and can result in the over-provisioning of both server and storage resources.
Server sprawl problems include                Restrictive Deployment Model: One instance tightly coupled to one server deployment means instance movement between servers is labor-intensive, time-consuming and prone to manual errors.        Business Implication: It becomes difficult, if not impossible, to respond to changes in workloads necessary to support the business—productivity, innovation, time-to-market and competitive advantage are threatened.        Expensive High Availability: Requires specialized redundant hardware or clustering—redundant hardware sits idle until failure.        Business Implication: Overall corporate budget drain, as well as lost opportunity to allocate resources to IT assets and/or head-count that could be dedicated to activities directly related to business core competencies and the company's bottom line.        Disruptive Technology Refresh/Migrations: Typically, every 3-4 years server/storage hardware tech is refreshed—requiring a forklift upgrade or rip-and-replace. Each refresh requires scheduled application downtime. Afterwards, applications must be reinstalled. If storage is refreshed, data must also be migrated.        Business Implication: Quite simply, most business organizations find it difficult to tolerate even a short interruption in service, let alone an extended disruption. Moreover, there are typically errors during the reinstallation of applications and/or data migrations, causing additional business disruptions.        Overall Poor Economic Value: Microsoft SQL Server sprawl leads to high licensing costs, over-provisioned servers and storage, and labor-intensive administration/management.        
As a result of server sprawl and inadequate load balancing, IT time and resource must be increasingly assigned to this application and server management resulting in a missed opportunity to assign funds and expertise to areas that would directly impact the bottom line. These and other deficiencies of the prior art are addressed by one or more embodiments of the present invention. Additional advantages and novel features of this invention shall be set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the following specification or may be learned by the practice of the invention. The advantages of the invention may be realized and attained by means of the instrumentalities, combinations, compositions, and methods particularly pointed out in the appended claims.