1. Technical Field
The present disclosure relates to reservations in a cluster or more specifically to a system and method of providing dynamic roll-back reservations for compute resources.
2. Introduction
The present disclosure relates to a system and method of allocation resources in the context of a grid or cluster of computers. Grid computing can be defined as coordinated resource sharing and problem solving in dynamic, multi-institutional collaborations. Many computing projects require much more computational power and resources than a single computer or single processor can provide. Networked computers with peripheral resources such as printers, scanners, I/O devices, storage disks, scientific devices and instruments, etc., can need to be coordinated and utilized to complete a task or a job.
Grid/cluster resource management generally describes the process of identifying requirements, matching resources to applications, allocating those resources, and scheduling and monitoring compute resources over time in order to run applications and workload as efficiently as possible. Each project will utilize a different set of resources and thus is typically unique. In addition to the challenge of allocating resources for a particular job, administrators also have difficulty obtaining a clear understanding of the resources available, the current status of the compute environment and real-time competing needs of various users. One aspect of this process is the ability to reserve resources for a job. A workload manager will seek to reserve a set of resources to enable the compute environment to process a job at a promised quality of service. One example of workload management software is the various compute environment management software available from Cluster Resources, Inc., such as the Moab™ Workload Manager, Moab™ Cluster Manager, the Moab™ Grid Suite and the Moab™ Cluster Suite.
General background information on clusters and grids can be found in several publications. See, e.g., Grid Resource Management, State of the Art and Future Trends, Jarek Nabrzyski, Jennifer M. Schopf, and Jan Weglarz, Kluwer Academic Publishers, 2004; and Beowulf Cluster Computing with Linux, edited by William Gropp, Ewing Lusk, and Thomas Sterling, Massachusetts Institute of Technology, 2003.
It is generally understood herein that the terms grid and cluster are interchangeable in that there is no specific definition of either. In general, a grid will include one or more clusters as will be shown in FIG. 1A. Several general challenges exist when attempting to maximize resources in a grid. First, there are typically multiple layers of grid and cluster schedulers. A grid 100 generally includes a group of clusters or a group of networked computers. The definition of a grid is very flexible and can mean a number of different configurations of computers. The definition can depend on how a compute environment is administered and controlled via local control (clusters) or global control/administration (grids). The introduction here is meant to be general given the variety of configurations that are possible.
A grid scheduler 102 communicates with one or more cluster schedulers 104A, 104B and 104C. Each of these cluster schedulers communicates with a respective resource manager 106A, 106B or 106C. Each resource manager communicates with a respective series of compute resources shown as nodes 108A, 108B, 108C in cluster 110, nodes 108D, 108E, 108F in cluster 112 and nodes 108G, 108H, 108I in cluster 114.
Local schedulers (which can refer to either the cluster schedulers 104 or the resource managers 106) are closer to the specific resources 108 and do not allow grid schedulers 102 direct access to the resources. Examples of compute resources include data storage devices such as hard drives and computer processors. The grid level scheduler 102 typically does not own or control the actual resources. Therefore, jobs are submitted from the high level grid-scheduler 102 to a local set of resources with no more permissions that then user would have. This reduces efficiencies and can render the reservation process more difficult. When jobs are submitted from a grid level scheduler 102, there is access information about the person, group or entity submitting the job. For example, the identity of the person who submitted the job can have associated with it a group of restrictions but also guarantees of service, such as a guarantee that 64 processors will be available within 1 hour of a job submission.
The heterogeneous nature of the shared resources also causes a reduction in efficiency. Without dedicated access to a resource, the grid level scheduler 102 is challenged with the high degree of variance and unpredictability in the capacity of the resources available for use. Most resources are shared among users and projects and each project varies from the other. The performance goals for projects differ. Grid resources are used to improve performance of an application but the resource owners and users have different performance goals: from optimizing the performance for a single application to getting the best system throughput or minimizing response time. Local policies can also play a role in performance.
Within a given cluster, there is only a concept of resource management in space. An administrator can partition a cluster and identify a set of resources to be dedicated to a particular purpose and another set of resources can be dedicated to another purpose. In this regard, the resources are reserved in advance to processing the job. By being constrained in space, the nodes 108A, 108B, 108C, if they need maintenance or for administrators to perform work or provisioning on the nodes, have to be taken out of the system, fragmented permanently or partitioned permanently for special purposes or policies. If the administrator wants to dedicate them to particular users, organizations or groups, the prior art method of resource management in space causes too much management overhead requiring a constant adjustment to the configuration of the cluster environment and also losses in efficiency with the fragmentation associated with meeting particular policies.
Reservations of compute resources were introduced above. To manage the jobs submissions, a cluster scheduler will employ reservations to insure that jobs will have the resources necessary for processing. FIG. 1B illustrates a cluster/node diagram for a cluster 124 with nodes 120. Time is along the X axis. An access control list (ACL) 114 to the cluster is static, meaning that the ACL is based on the credentials of the person, group, account, class or quality of service making the request or job submission to the cluster. The ACL 114 determines what jobs get assigned to the cluster 110 via a reservation 112 shown as spanning into two nodes of the cluster. Either the job can be allocated to the cluster or it can't, and the decision is determined based on who submits the job at submission time. Further, in environments where there are multiple clusters associated with a grid, and workload is transferred around the grid, there is a continual difficulty of managing restrictions and guarantees associated with each entity that can submit jobs. Each cluster will have constant alterations made to users and groups as well as modifications of the respective compute environment. Currently, there is no mechanism to insure that up-to-date identity information for a particular user where workload submitted by that user can be transferred to an on-demand site or to a remote cluster from the submitter's local environment.
One deficiency with the prior approach is that there are situations in which organizations would like to make resources available but only in such a way as to balance or meet certain performance goals. Particularly, groups can establish a constant expansion factor and make that available to all users or they can make a certain subset of users that are key people in an organization and give them special services when their response time drops below a certain threshold. Given the prior art model, companies are unable to have the flexibility over their cluster resources.
To improve the management of cluster resources, what is needed in the art is an improved method for a scheduler, a cluster scheduler or cluster/grid workload management system to manage resources. Further what is needed is an improved method of managing reservations such that the user of the compute environment is more efficient while maintaining policies and agreed qualities of service.