1. Field of the Invention
Embodiments of the present invention relate generally to computer systems. More particularly, embodiments of the present invention relate to a rack-and-blade computing system and method for automating how computing resources (e.g. blade servers) are allocated to workloads for meeting quality of service requirements.
2. Description of the Background Art
The information technology (IT) industry is demanding higher computer density per square foot of datacenter space in order to reduce costs while at the same time IT service delivery capabilities must increase. Another key trend in blade servers has been the standardization of server architectures around Intel processors and chipsets, and Windows or Linux operating environments. Inexpensive industry-standard servers running Windows or Linux are proliferating in the datacenter, creating a manageability problem. With more servers entering the datacenter, IT organizations must hire additional system administrators to manage the servers and the applications they run. In response to requirements for higher density and lower management-cost systems, many major vendors have recently introduced products based on a new server architecture, a “rack and blade” architecture.
Broadly, a blade server is a thin, modular electronic circuit board, containing at least one (e.g., two or more) microprocessors and memory and optionally permanent storage. More specifically, a blade server is a single, self-contained computer-motherboard, processor, memory, disk and connectivity—that screws into a slot on a standard space-saving computer rack. All blade servers typically share a single (or more commonly, a dual redundant) power supply, fans, and backbone. The connectivity of the blade servers to the backbone is either proprietary or standards-based (e.g., such as Compact PCI).
A blade server is typically intended for a single, dedicated application (such as serving web pages) and may easily be inserted into a slot on the space-saving rack which many include similar servers. Some space-saving racks, by way of example only, have the capacity to install up to 280 blade servers in a standard 42 U racks, all sharing a common high-speed bus and designed to create less heat, thus saving energy costs as well as space. Large data centers and Internet service providers (ISPs) that host Web sites are among the users of blade servers.
A blade server is sometimes referred to as a “high-density server” and is typically used in a clustering of servers that are dedicated to a single task, such as file sharing, web page serving and caching, SSL encrypting or web communication, transcoding of web page content for smaller displays, and audio and video content streaming. A blade server usually comes with an operating system and is normally dedicated to a single application or application component. The storage required by the blades could be embedded in the blade, or available externally via standard connectivity mechanisms such as Storage Area Networks (SAN), or Network Attached Storage (NAS). The operating system and applications required to operate the blades can be loaded from the storage device(s) available to the blades.
Like more traditional clustered servers, blade servers can also be managed to include load balancing and failover capabilities. Load balancing is dividing the amount of work that a blade server has to do between two or more blade servers so that more work gets done in the same amount of time and, in general, all users get served faster. Load balancing may be implemented with hardware, software, or a combination of both. Typically, load balancing is the main reason for blade server clustering. Failover is a backup operational mode in which the functions of a primary blade server are assumed by a secondary blade server when the primary blade server becomes unavailable through either future or scheduled down time.
Recent developments, such as the storage area network (SAN), make any-to-any connectivity possible among blade servers and data storage systems. In general, storage networks use many paths—each consisting of complete sets of all the components involved—between the blade server and the system. A failed path can result from the failure of any individual component of a path. Multiple connection paths, each with redundant components, are used to help ensure that the connection is still viable even if one (or more) paths fail. The capacity for automatic failover means that normal functions can be maintained despite the inevitable interruptions caused by problems with equipment.
Most blade server offerings today provide an integrated management solution. For example, the Dell PowerEdge 1655MC includes a management card that provides chassis and blade monitoring and remote power control through a dedicated management network, as well as operating system independent keyboard, video and mouse capability. The HP ProLiant BL e-Class includes Integrated Administrator, an integrated server blade management solution for remote or local access. It also includes a rapid deployment solution that makes it possible to install the operating system and applications on one or more blade servers from a central image repository.
Blade server architectures are ideal for information technology (IT) services or applications that can “scale out” or horizontally; that is, that can expand in capacity by adding additional servers to the pool of servers performing a task. Some examples of services that scale horizontally are: web servers, primarily via HTTP; file servers, normally FTP, but also includes media streaming; and application servers.
Multiple web servers can be connected to a load-balancing network device to share the task of serving webpage requests. File servers are multiple servers that may combine to provide higher throughput. There is normally a traffic management device in front of these servers to virtualize access to the service over the network. Application servers are servers that execute business logic on a standard platform, such as Java 2 Enterprise Edition. Multiple application servers may operate together to deliver a higher service capacity, by sharing the load.
While most major vendors today offer integrated management solutions with blade servers, these solutions fall short of providing full provisioning automation. The operator must decide what applications or services run on what blade servers, and manage availability and performance on each blade server using tools, such as for instance, Insight Manager or Openview, both from HP. In case of spikes in demand, it might be necessary to increase the number of blade servers supporting an application, as in the case of a rapid increase in website hits. To respond quickly significant human intervention is required, even when taking advantage of performance monitoring and alarming and rapid deployment. For example, in order to maintain a pre-specified level of HTTP service running on a group of blades, it may be necessary to perform a number of steps.
One step would be to ensure that a performance monitoring service must be in operation to detect degradation in quality of service, so that appropriate action can be taken. There are many different mechanisms to assert performance degradation, most are based on system-level performance metrics such as CPU consumption thresholds, number of processes, number of concurrent active connections, and others. The performance monitoring service may also be obtained at the application level, for example, the number of pages served per unit of time, or the average response time per request.
Another step would be to select a candidate blade server that can be allocated to perform the service requiring additional server resources. This process might require identifying another service provided by the rack that can withstand a decrease in the number of servers running it. Alternatively, the blade server can be obtained from a free pool of standby servers maintained by the system administrator.
Once a candidate blade server is identified for allocation, if already actively performing a service, the candidate blade server needs to be “flushed” of data and processes, and reconfiguration notifications sent to traffic management devices which direct traffic to this blade server, so that traffic is no longer routed to it. The flushing process may also involve reassignment of IP addresses to this blade server.
A further step that may be necessary to maintain a pre-specified level of HTTP service operating on a group of blade servers would be that the candidate blade server may need to be preloaded with the operating environment and the application binaries that it needs to perform the desire task. In order to execute this step it would be necessary to have a rapid deployment system that an operator may use to select the right image from a repository and load it on the candidate blade server.
Once the operating system and the application code are loaded, it would then be necessary to configure the candidate blade server. This could involve the addition of data and agents, and any other steps that are specific to this candidate blade server and not captured by the previously indicated rapid deployment step. Once the candidate blade server is configured and running, it would then be necessary to add it to the pool of blade servers performing the same task. This normally demands a configuration change to the traffic management device which directs traffic to the blade server pool.
Finally, after all of the indicated steps have been performed, the performance monitoring loops starts again with the reconfigured pool. At regular intervals, or when alarms go off, capacity demands are examined by the monitoring tools, and blade servers are again rebalanced to meet overall service level objectives for all the services deployed on the rack.