1. Technical Field
The present invention relates to workload management. More particularly, the present invention relates to an improvement of a push/pull workload management model with intelligent routing to effectively collect data from systems that consist of dynamic sub-systems.
2. Description of Related Art
Widely distributed, “open” network computer resources are the norm in today's network environment. These resources make up a complex grouping of applications, desktops, networks and servers, each with specific requirements and functions.
In both distributed and IBM® z/OS™ environments, workload scheduling, the orderly sequencing of batch program execution should be flexible to accommodate varying resources and levels of demands securely and automatically. If timely scheduling is desired to help meet service-level agreements, any information technology (IT) department should be able to set policies that govern batch activity.
Systems administrators are in a unique position in that they are expected to understand the extraordinary batch workload demands created when key business processes are automated. With the addition of each new application, whether for enterprise resource planning, customer relationship management, financial reporting or another vital business activity, the batch workload may grow at an incredible rate. At many companies there is a potential for daily batch processing workloads to triple or even quadruple in several years. At the same time, the window for processing jobs is shrinking, with many critical jobs needing to be completed before each day's regular online work begins.
The workload management challenge may be further complicated by interdependencies between jobs and the availability of personnel who understand how to manage batch jobs in the context of important business priorities. Consider, for example, the requirements that may be triggered when a customer places an order over the Internet. Depending on the environment, the customer's request could initiate a UNIX® job to acknowledge the order, an IBM AS/400® and iSeries™ job to order parts, an OS/390® and zSeries™ job to debit the customer's bank account or payment system and a Microsoft® Windows NT® job to print a docket and address labels. If any job fails, the company's revenues and customer satisfaction might be at risk.
Because batch workloads provide the raw material for many of today's automated business processes, it is more important than ever for systems administrators to have an efficient, reliable and scalable way to manage their batch workloads seamlessly in a distributed environment. When evaluating management software options to help you plan, organize and execute workload production in your environment, you should consider several key criteria, such as the ability to:
Integrate workloads from multiple applications, across multiple platforms and operating systems;
Handle rapidly increasing batch workload demands;
Automate tasks to enhance productivity of resources and personnel; and
Drive business value by integrating with other system management solutions.
In a system that consists of a number of sub-systems, there is usually an intelligent agent monitoring the performance and managing the resources. Depending on the workload and demand, the intelligent agent may dynamically expand and shrink the system, i.e., start and stop sub-systems. For example, in an on-demand system, servers may be dynamically started and stopped based on the MAPE (monitoring, analysis, plan, and execution) loop calculation to meet the service requirements. As another example, in a cluster, the workload manager may dynamically start and stop cluster members based on the performance analysis and workload distribution. As an additional example, in a IBM® z/OS™ application server usually has one of multiple servant processes and these servant processes may come and go depending on the workload.
Performance monitoring is very important in order to meet the service requirements in such a dynamic system. The performance data are tracked in individual sub-systems. The data from all the sub-systems will be summed up to calculate the overall performance of the system. Many summed up performance data represent the overall status of the system including both the sub-systems that are running and the sub-systems that have once started but now stopped. These data are monotonically increasing by nature. For example, the total number of requests should include all the requests processed by the system.
The existing monitoring tools usually call application program interfaces (APIs) to collect the performance data from the running sub-systems and then calculate the overall status by adding them up. However, this calculation may not get the real overall status of the system since the performance data are tracked by individual sub-systems, the data in a sub-system will be gone when the sub-system is stopped. Simply adding performance data over the running sub-systems will lose the data from the stopped sub-systems.