1. Technical Field
The present invention relates in general to improved performance in distributed systems and in particular to a method for minimizing complex decision making when allocating additional resources to a job submitted to a first selection of resources in a grid environment. Still more particularly, the present invention relates to storing previous decisions to allocate additional resources in a grid environment according to characteristics of the jobs for which decisions were made, such that the stored decisions can be reused for subsequent jobs with similar characteristics to minimize complex decisions when allocating additional resources in a grid environment.
2. Description of the Related Art
Ever since the first connection was made between two computer systems, new ways of transferring data, resources, and other information between two computer systems via a connection continue to develop. In a typical network architecture, when two computer systems are exchanging data via a connection, one of the computer systems is considered a client sending requests and the other is considered a server processing the requests and returning results. In an effort to increase the speed at which requests are handled, server systems continue to expand in size and speed. Further, in an effort to handle peak periods when multiple requests are arriving every second, server systems are often joined together as a group and requests are distributed among the grouped servers. Multiple methods of grouping servers have developed such as clustering, multi-system shared data (sysplex) environments, and enterprise systems. With a cluster of servers, one server is typically designated to manage distribution of incoming requests and outgoing responses. The other servers typically operate in parallel to handle the distributed requests from clients. Thus, one of multiple servers in a cluster may service a client request without the client detecting that a cluster of servers is processing the request.
Typically, servers or groups of servers operate on a particular network platform, such as Unix or some variation of Unix, and provide a hosting environment for running applications. Each network platform may provide functions ranging from database integration, clustering services, and security to workload management and problem determination. Each network platform typically offers different implementations, semantic behaviors, and application programming interfaces (APIs).
Merely grouping servers together to expand processing power, however, is a limited method of improving efficiency of response times in a network. Thus, increasingly, within a company network, rather than just grouping servers, servers and groups of server systems are organized as distributed resources. There is an increased effort to collaborate, share data, share cycles, and improve other modes of interaction among servers within a company network and outside the company network. Further, there is an increased effort to outsource nonessential elements from one company network to that of a service provider network. Moreover, there is a movement to coordinate resource sharing between resources that are not subject to the same management system, but still address issues of security, policy, payment, and membership. For example, resources on an individual's desktop are not typically subject to the same management system as resources of a company server cluster. Even different administrative groups within a company network may implement distinct management systems.
The problems with decentralizing the resources available from servers and other computing systems operating on different network platforms, located in different regions, with different security protocols and each controlled by a different management system, has led to the development of Grid technologies using open standards for operating a grid environment. Grid environments support the sharing and coordinated use of diverse resources in dynamic, distributed, virtual organizations. A virtual organization is created within a grid environment when a selection of resources from geographically distributed systems operated by different organizations with differing policies and management systems is organized to handle a job request.
In addition to decentralizing resources available in a grid environment to improve efficiency of network transactions, capacity on demand resources are gaining more presence. An on demand resource is one that is accessible to a system, but is operational only when a fee is paid or an electronic key to open the resource is provided.
An important attribute of a grid environment that distinguishes a grid environment from merely that of another management system is quality of service maintained across multiple diverse sets of resources. A grid environment does more than just provide resources; a grid environment provides resources with a particular level of service including response time, throughput, availability, security, and the co-allocation of multiple resource types to meet complex user demands. A limitation of current grid technology, however, is that maintenance of agreed to quality of service from grid resources requires human intervention. For example, human intervention is relied on in a grid environment to decide when to allocate and deallocate resources to reach specified performance levels. Further, manual intervention is relied on in a grid environment to suspend low priority jobs or move jobs to other selections of resources within the grid environment. Manual intervention is limiting on the efficiency and expansion of grid environments because it is by nature inefficient and prone to error.
As a result of developing grid environments and on demand resources, a single system may have access to multiple discrete sets of resources. For example, first, a system typically accesses those components within the system that provide a primary set of local resources. Next, a system may access resources from other systems within a local or enterprise network. Further, a system may access and activate capacity on demand resources either from within the system or from a system accessible via a network. Finally, a system may access grid resources accessible through participation in a grid environment.
With the availability of multiple sets of discrete resources, an additional limitation of current grid technology is that human intervention is required to manage the flow between each of these discrete sets of resources to determine whether use of grid resources is required. Further, human intervention is required to determine whether to activate capacity on demand resources first or go directly to grid resources. Thus, a disadvantage of current grid technology is that when a job request is executing within a set of resources that become unable to handle performance requirements for the job, human intervention is required to decide whether to schedule the job into a grid environment or other set of resources. Given the quality of service requirements within a grid environment, it would first be advantageous to make decisions about the flow of a job through discrete sets of resources without requiring human intervention.
A common feature in network computing is that the same type of job may be requested from the same client system or multiple client systems within a short period of time. If the first time the job is received a complex decision has to be made to manage the flow of the job between discrete sets of resources, it would be advantageous to reuse that complex decision for other similar jobs. Therefore, in view of the foregoing, it would be advantageous to provide a method, system, and program for improving the efficiency of the use of a hierarchy of resources in a grid environment by storing complex decisions about the flow of a job such that the complex decisions may be reused for future jobs of the same type.