The present invention relates to collaborative workload management where an improved workload scheduler and workload manager better the overall distribution and balancing of work in a computing system, resulting in a better throughput of work, better utilization of system resources, and more consistent processing times.
A workload scheduler is a software component that submits work for execution according to a predefined schedule. Factors that affect when the work is submitted include temporal values like date, time, day-of-the-week, and dependencies such as the completion of preceding work items and resource availability.
One example of a workload scheduler is described in xe2x80x9cTivoli OPC General Information,xe2x80x9d IBM Pub. No. GH19-4372-02 (December 1999) and related publications. Tivoli OPC (Operations, Planning and Control) automates, monitors, and controls the flow of work through an enterprise""s entire data processing operation on both local and remote systems.
A workload manager (WLM) on the other hand is a software component that manages system resources that are to be made available to each executing work item based on performance criteria that define, implicitly or explicitly, relative priorities between competing work items.
One example of a workload manager is described in xe2x80x9cOS/390 MVS Planning: Workload Management,xe2x80x9d IBM Pub. No. GC28-1761-07 (March 1999), and xe2x80x9cOS/390 MVS Programming: Workload Management Services,xe2x80x9d IBM Pub. No. GC28-1773-06 (March 1999). This workload manager balances workload among the systems of an S/390 parallel sysplex cluster in order to achieve optimal load balancing and system performance.
The use of the terms work, work unit, and unit of work in this context are interchangeable, and are used to represent useful user-defined processing on a computer system. The particular term applied by users of the computer system depends on the system typexe2x80x94common terms include job and task.
In the example of OS/390, each work unit is associated with a service class, for example, online transaction, high priority batch, low priority batch, etc. Each service class carries with it a set of parameters which indicate to the WLM the performance criteria of the associated work units, so that if the WLM notes that the resources being allocated to work units of a given service class are repeatedly failing to enable work units of that service class to meet their performance criteria, the WLM can adjust the resources being allocated to work units of that service class. (The techniques used in this adjustment are beyond the scope of the present invention, but are nonetheless well known in the art.)
Beyond this, however, the WLM is unable to make workload management decisions which take into account either the history of an individual job of a given service class or the state of an instance of a job as it is being currently processed.
Companies are becoming increasing more reliant on workload schedulers to automate the submission of large quantities of work and to complete the workload within an increasingly small window of time as so the above problems are becoming more and more pertinent.
An attempt to solve the problems of individual jobs repeatedly failing to meet performance criteria or instances of jobs failing to meet performance criteria has been made with the V2R3 release of Tivoli Operations and Control (OPC) in December, 1999. OPC, as a workload scheduler running on OS/390, identifies late-running, long-running, or late-starting jobs, and attempts to reduce the delay to the workload by moving the jobs to a higher performing WLM service class. However, this can produce highly erratic results, as the aid that a late job will receive is directly tied to the customer""s service class definitions, so any benefits can range from negligible to dramatic overcompensation at the cost of competing work.
According to the present invention there is provided a collaborative workload management system comprising: a workload scheduler co-operable with a schedule to submit work units for processing on a computer system according to said schedule; and a workload manager adapted to monitor work units being submitted for processing on said computer system and to allocate resources for processing respective work units on said computer system according to a respective service class of said work units, said service class defining resources allowed for processing a work unit of said service class; said workload scheduler being adapted to further provide the workload manager with work unit attributes as each work unit is submitted for processing, said attributes comprising at least one indicator of the resources typically required by said work unit; and said workload manager being adapted to retrieve said work unit attributes and to tune the resources required to process said work unit according to said work unit""s attributes without exceeding the resources allowed for processing work units of said work unit""s service class.
Thus, using the invention, the workload scheduler aids the workload manager in achieving business goals by providing it with the attributes of work as it is being submitted for processing. In this way the workload manager can make intelligent decisions about where and how the work will be executed (e.g. which system in a clustered system environment), based on how much system resource is likely to be consumed by the work unit, and what type of system resource the work unit requires.
Preferably, the workload manager further aids the workload scheduler to achieve its goal of scheduling work according to its predefined schedule. The scheduler solicits this aid from the workload manager in situations where the workload is running late with respect to the schedule. This situation happens if:
a. A unit of work runs late by not finishing by the end time defined in the schedule.
b. A unit of work begins execution late by starting after the scheduled start time.
c. A unit of work overruns by execution for longer than its scheduled duration.
This helps achieve performance criteria for a job particularly when unpredicted delays take place.
In particular and by contrast to OPC V2R3, the invention uses the intelligence of the workload manager to aid a late work unit without ignoring the job""s service class performance criteria and without the risk of negatively impacting competing work.