1. Technical Field
The present disclosure relates to clusters and more specifically a system and method of creating a virtual private cluster.
2. Introduction
The present disclosure applies to computer clusters and computer grids. A computer cluster can be defined as a parallel computer that is constructed of commodity components and runs commodity software. FIG. 1 illustrates in a general way an example relationship between clusters and grids. A cluster 110 is made up of a plurality of nodes 108A, 108B, 108C , each containing computer processors, memory that is shared by the processors in the node and other peripheral devices such as storage discs connected by a network. A resource manager 106A for the node 110 manages jobs submitted by users to be processed by the cluster. Other resource managers 106B, 106C are also illustrated that can manage other clusters (not shown). An example job would be a weather forecast analysis that is compute intensive that needs to have scheduled a cluster of computers to process the job in time for the evening news report.
A cluster scheduler 104A can receive job submissions and identify using information from the resource managers 106A, 106B, 106C which cluster has available resources. The job would then be submitted to that resource manager for processing. Other cluster schedulers 104B and 104C are shown by way of illustration. A grid scheduler 102 can also receive job submissions and identify based on information from a plurality of cluster schedulers 104A, 104B, 104C which clusters can have available resources and then submit the job accordingly.
Several books provide background information on how to organize and create a cluster or a grid and related technologies. See, e.g., Grid Resource Management. State of the Art and Future Trends, Jarek Nabrzyski, Jennifer M. Schopf, and Jan Weglarz, Kluwer Academic Publishers, 2004; and Beowulf Cluster Computing with Linux, edited by William Gropp, Ewing Lusk, and Thomas Sterling, Massachusetts Institute of Technology, 2003.
FIG. 2 illustrates a known arrangement 200 comprising a group of computer clusters 214, 216, 218 consisting of a number of computer nodes 202, 204, 206, each having a group of memory disks, swap, local to the computer itself. In addition, there can exist a number of services that are a part of that cluster. Block 218 comprises two components, a cluster 202 and a storage manager 212 providing network storage services such as LAN-type services. Block 218 illustrates that the network storage services 212 and the cluster or object 202 are organized into a single and independently administered cluster. An example of this can be a marketing department in a large company that has an information technology (“IT”) staff that administers this cluster for that department.
Storage manager 212 can also communicate with nodes or objects 204 in other clusters such as are shown in FIG. 1. Block 216 shows a computer cluster 204 and a network manager 210 that communicate with cluster 204 and can impact other clusters, shown in this case as cluster 202 and cluster 206.
Block 214 illustrates a computer cluster 206 and a software license manager 208. The license manager 208 is responsible for providing software licenses to various user applications and it ensures that an entity stays within bounds of its negotiated licenses with software vendors. The license manager 208 can also communicate with other clusters 204 as shown.
Assuming that computer clusters 214, 216 and 218 are all part of a single company's computer resources, that company would probably have a number of IT teams managing each cluster 216, 214, 218. Typically, there is little crossover or no crossover between the clusters in terms of managing and administration from one cluster to another other than the example storage manager 212, network manager 210 or license manager 208.
There are also many additional services that are local and internal to each cluster. The following are examples of local services that would be found within each cluster 214, 216, 218: cluster scheduling, message passing, network file system auto mounter, network information services and password services are examples of local services shown as feature 220 in block 214. These illustrate local services that are unique and locally managed. All of those have to be independently managed within each cluster by the respective IT staff.
Assuming that a company owns and administers each cluster 218, 216 and 214, there are reasons for aggregating and partitioning the compute resources. Each organization in the company desires complete ownership and administration over its compute resources. Take the example of a large auto manufacturing company. Various organizations within the company include sales, engineering, marketing and research and development. The sales organization does market research, looking at sales, historical information, analyzing related data and determining how to target the next sales campaign. Design graphics and rendering of advertising can require computer processing power. The engineering department performs aerodynamics and materials science studies and analyses. Each organization within the company has its own set of goals and computer resource requirements to make certain they can generate its deliverables to the customers.
While this model provides each organization control over their resources, there are downsides to this arrangement. A large cost is the requirement for independent IT teams administering each cluster. There is no opportunity for load balancing where if the sales organization has extra resources not being used, there is no way to connect these clusters to enable access by the engineer teams.
Another cause of reduced efficiency with individual clusters as shown in FIG. 1 is over or under restraining. Users who submit jobs to the cluster for processing desire a certain level of response time according to their desired parameters and permissions. In order to insure the response time, cluster managers typically must significantly over-specify the cluster resources to get the results they want or control over the cycle distribution. When a job is over-specified and then submitted to the cluster, often the job simply does not utilize all the specified resources. This process can leave a percentage of the resources simply unused.
What is needed in the art is a means of maintaining cluster partitions but also sharing resources where needed to improve the efficiency of a cluster or a group of clusters.
SUMMARY
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or can be learned by practice of the herein disclosed principles. The features and advantages of the disclosure can be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the disclosure will become more fully apparent from the following description and appended claims, or can be learned by the practice of the principles set forth herein.
Those who manage clusters or submit jobs to clusters want to be able to control the clusters' resources in an efficient manner. There was previously no mechanism to soft partition a cluster or a group of clusters to provide managers with the control they want without giving them a whole lot of additional overhead. Most users do not care how their cluster is set up as long as the resources are available to process submitted jobs and they have the desired level of control.
The present disclosure addresses the deficiencies in the prior art by providing a system and method of establishing a virtual private cluster out of a group of compute resources. In one aspect of the disclosure, the group of compute resources can be viewed as a group of clusters. In order to address the deficiencies in the prior art, the present disclosure introduces steps to create and utilize a virtual private cluster. The method includes aggregating compute resources across the group of compute resources and can be implemented by a computer processor. This step can comprise two levels, a first level of aggregating multiple resources of the same type and a second level of aggregating resources of distinct types. Aggregating multiple resources of the same type would typically indicate pulling together compute hosts that are possibly connected across multiple networks (or clusters) and aggregating those as though they were one giant cluster. The second type of aggregating involves resources of various types. For example, this second type can involve aggregating compute resources together with network resources, application or license management resources and storage management resources.
The method next includes establishing partitions of the group of compute resources to fairly distribute available compute resources amongst a plurality of organizations and presenting only partitioned resources accessible by each organization to users within each organization, wherein the resources presented to each is the virtual private cluster. In this manner, aggregating, partitioning and presenting to a user only his or her soft partitioned resources enables a more efficient use of the combined group of clusters and is also transparent to the user while providing the desired level of control over the virtual private cluster to the user.
Various embodiments of the disclosure include systems, methods and computer-readable media storing instructions for controlling a computing device to perform the steps of generating a virtual private cluster. A tangible computer-readable medium excludes energy, signals per se, and a wireless interface.
Applicants note that the capability for performing the steps set forth herein are contained within the source code filed with the CD in the parent provisional application. For example, a resource scheduler or cluster workload manager can establish reservations for jobs and virtual private clusters within a compute environment through a resource manager.