1. Field of the Invention
The present invention relates to a hybrid compute environment such as a cluster in a grid in which multiple different types of operating systems exist on various nodes within the compute environment and more particularly to a system and method of managing automated provisioning wherein operating systems on one or more nodes may be automatically reprovisioned to be separate operating systems based on various factors such as needs associated with pending or predicted workload.
2. Introduction
A high performance computing (HPC) typically refers to the use of parallel super computers and computer clusters that comprise multiple processors linked together in a single system with a commercially available interconnection. While a high level of technical skill is typically needed to organize and manage such systems, they can be created with commodity components. Because of their flexibility and relatively low cost, HPC systems increasingly dominate the world of super computing. HPC has traditionally been dominated by the Linux operating system. However, experts predict that Microsoft® Windows-based data centers, clusters or compute environments may become more prevalent in the near future. This may be due to a variety of factors such as Microsoft's® strong relationship with application vendors, many of whom have already ported their HPC applications to Windows Compute Cluster Server 2003 (CCS). Further, there is an increasing demand for work group clusters which primarily involves a market segment composed of Window users who are new to the HPC concept. As a result, HPC environments that strictly ran Linux in the past are exploring the options of Windows-based clustering. The Window/Linux Hybrid cluster reduces a Linux environment's barriers to adopting Windows in the HPC environment. However, there may be difficulty in the flexibility required when attempting to manage a hybrid environment in which some nodes may run a first computer operating system such as Linux and other nodes may run a second operating system such as a Windows-based operating system. Accordingly, what is needed in the art is an improved method of managing a hybrid clustering environment.
FIG. 2A illustrates several different compute environments and also may represent a single hybrid compute environment. A first environment 200 may represent a separate compute environment or a portion of a hybrid compute environment. While the Linux operating system and a Windows-based operating system are discussed, it is contemplated that the present invention would relate to any first operating system that is different from a second operating system. There are many different types of operating systems such as a Macintosh operating system and so forth that are contemplated as within the scope of the present invention and the use of the terms of Linux and Windows are only used inasmuch as they are prevalent types of operating systems and enabling the flexibility within these two types of operating systems triggered the development of the present invention.
Nodes 202 represent a first operating system, such as a Linux operating system, that is managed by a first resource manager 204. As would be known in the art, this first resource manager may are TORQUE, Platform's Load Sharing Facility (LSF), PBS Pro from Altair Engineering and so forth. These resource managers typically, as is known in the art, enable the nodes 202 to communicate with a workload manager 206 that receives jobs 208 that are submitted by users. Environment 210 represents a Windows-based environment (i.e., a second operating system) which again may be a part of the same hybrid compute environment or a separate environment in which nodes 212 run a Windows-based operating system such as Windows' compute cluster server (CCS). A resource manager 214 for the second operating system enables a workload manager 216 to communicate with the Windows-based nodes. Again, jobs 208 may be submitted through the workload manager 216 which communicates with the resource manager 214 to enable the jobs to actually consume resources within the environment 210.