1. Field of the Invention
The invention relates to apparatus and accompanying methods for use preferably, though not exclusively, in a multi-system shared data (sysplex) environment, wherein each system provides one or more servers, for dynamically and adaptively assigning and balancing new work and for session requests, among the servers in the sysplex, in view of attendant user-defined business importance of the requests and available sysplex resource capacity so as to meet overall business goals.
2. Description of the Prior Art
Prior to the early-1980s, large scale computing installations often relied on using a single monolithic computer system to handle an entire processing workload. If the system failed, all processing applications in the workload were suspended until the failure was remedied. While a resulting processing delay was tolerated at first, as increasingly critical applications were processed through the system, any such ensuing delays became increasingly intolerable. Furthermore, as processing needs increased, the entire system was eventually replaced with a new one of sufficient capacity. Replacing systems in that manner proved to be extremely expensive and very inefficient. However, at that time, few workable alternatives existed, to using monolithic systems, that appreciably eliminated both these outages and an eventual need to replace the entire system.
To efficiently address this need, over the past several years and continuing to the present, computer manufacturers are providing processing architectures based on a multi-system shared data approach. Through these architectures, multiple large-scale computer systems, each of which is often referred to as a computer processing complex (CPC) or a central electronic complex (CEC), are inter-connected, through, for example, a coupling facility or other inter-processor communication mechanism, to permit each such system to gain read-write access to data residing on one or more shared input/output devices, such as a direct access storage device (DASD). The resulting inter-connected computer system is commonly referred to as a "sysplex". In a sysplex, as with a typical multi-processing environment, a processing workload is generally distributed among all of the inter-connected computer systems such that each computer system is responsible for processing a portion of the entire workload. Conventionally then, each of these systems executes its own portion of the total workload independently of that undertaken by any the other such systems. Owing to the inherent high reliability and highly cost-efficient expansion potential of a sysplex architecture, sysplexes are particularly attractive in handling so-called critical business support applications that involve real-time transaction processing and can tolerate essentially no downtime.
Generally, within a sysplex, separate copies (so-called "instances") of an application are resident and simultaneously active on more than one of the computer systems, each henceforth referred to as a "machine" to differentiate the physical hardware therefor, and, based upon, e.g., the processing capacity required for the application, often on all or most of these machines.
Certain currently available machines that can be readily incorporated into a sysplex, such as illustratively the Enterprise System/9000 (ES/9000) Series manufactured by the International Business Machines (IBM) Corporation, can each support, if appropriately configured, multiple actively and simultaneously executing copies of various operating systems (OS) to implement separate corresponding individual and unique application processing environments. (Enterprise System/9000 is a registered trademark, and ES/9000 is a trademark, of the International Business Machines Corporation.) Each of these environments utilizes a separate copy of the operating system, such as the MVS/ESA (henceforth simply "MVS") OS, which forms a so-called OS "image", along with an instance(s) of corresponding application program(s) and a dedicated storage area (typically a logical partition--"LPAR"). (MVS/ESA is a trademark, and IBM is a registered trademark, of the International Business Machines Corporation.) As such, each such environment thus constitutes a separate "processing system" (henceforth referred to, for the sake of brevity, as simply a "system"). Each application instance that executes on any such system constitutes a separate application server (henceforth referred to as simply a "server" or "real instance") to service a portion of the total workload presented to the overall application on the sysplex. A system, based on its processing capacity and that required by the corresponding applications, can implement one or more corresponding servers.
A recurring difficulty in using multiple servers has been how to effectively balance the current processing load across the servers. Traditionally, operating systems, such as the MVS OS, relied on a totally static approach to allocating available sysplex resources, such as available servers, processing time, and processor storage, to each current work request. To accomplish this, system administrators utilized historic performance measurements of past workload processing to project just what sysplex resources would then be available as each new work request was presented to the sysplex and how these available resources should be allocated to handle that request. The overall goal of the administrator in allocating these resources to the current work requests was simply to keep each system maximally busy, i.e., to utilize as many available clock cycles thereon as possible, in effect keeping that system "pegged" and hence maximizing its throughput.
For a sysplex, historic averaged performance measurements were made over a variety of intervals and in relation to a variety of causes: e.g., on a day-by-day basis, on an hour-by-hour basis, and by each individual application, as well as in relation to other time or usage-related metrics. Based on this data, a system administrator determined, from projections made from this historic data: how current work requests should be assigned to individual servers, a dispatching priority for each one of these requests that would be queued on each server, i.e., the order in which these requests were to be executed on that server, and the amount of resources at that server to allocate to each new work request presented thereto. Once these determinations were made for an expected sysplex workload in view of the goal of maximizing throughput of each system, the administrator simply instructed the operating system at each server accordingly. Through this effort, the administrator strove to distribute the total workload, as he or she then foresaw it, across all the servers as evenly as possible consistent with maximizing the throughput of all the servers.
Unfortunately, dispatching relationships existing between different work requests queued for execution in a sysplex tend to be extremely complex. Not only were accurate predictions of workload and resource allocations across multiple servers extremely tedious and difficult to create, but also such allocations were based on static, i.e., fixed, workloads having concomitant demands for each server that were not expected to change over time. Unfortunately, in practice, workloads do change, often significantly with time. Predictions predicated on static workloads simply could not accommodate subsequent changes in sysplex workload. Hence, each time a new application or a change in arrival patterns or demand for existing workload was to occur in a sysplex, the administrator had to totally re-formulate (re-iterate) the predictions and accordingly change the work request assignments and resource allocations therefor in order to accommodate the additional work request. Doing so would, of necessity, involve determining whether any processing conflicts would arise by introduction of the new work request vis-a-vis existing requests then being processed and then resolving all such conflicts. Moreover, not only did each subsequent iteration consume substantial effort, but a static prediction assumed that future work requests, even for the same application, would behave as past work requests therefor did. Since this assumption often failed to account for sudden increases, i.e., spikes, in processing demand by an application, such as a surge in users and/or transactions therefor, these static workload predictions, coupled with fixed work request assignments and pre-determined sysplex resource allocations, simply could not efficiently accommodate dynamic changes in workload. Hence, imbalances between systems frequently arose through which one or more systems would be heavily loaded while others would be lightly loaded. Consequently, work requests that then had a high degree of business importance, and either could not wait or could tolerate only minimal delay, might nevertheless be queued on the former systems for relatively some time awaiting dispatch for execution, while queued work requests of much lesser business importance would be dispatched far more quickly on the latter systems. Hence, the sysplex, due to inter-system processing imbalances resulting from static work assignment and pre-defined resource allocation, was often unable to meet its business goals, i.e., its total current processing demand was not met and accompanying processing results were not provided in a manner temporally consistent with the business importance of the underlying application(s).
While the art teaches several approaches for providing improved workload balancing in a sysplex, or generally a multi-processing environment, all these approaches suffer drawbacks that limit their attractiveness and general utility.
Specifically, an early attempt at balancing workload across multiple systems involved physically connecting a certain number of users, on a pre-defined basis often in terms of physical wiring or other such interconnections, to each system and thereafter routing all work requests, incoming over a network and originating from those users, to only that system to the exclusion of all other systems. For brevity, we will refer to this approach hereinafter as "connection based" balancing. The user assignments, specifically the interconnections, were initially established such that an approximately equal number of users would be connected, at any one time, to each system. Under this approach, once a user, through a physical connection to a given system, established a terminal session thereat, all the work requests for that session were routed solely and directly to that given system. Unfortunately, significant inter-system processing imbalances frequently occurred. In that regard and at one extreme, one or a small number of users using one common system but having a collectively large demand for processing, could overwhelm that system to the detriment of all the other users executing applications thereat; while a large number of other such users, such as those having sessions with relatively little activity, connected to one or more other system(s) might collectively present relatively light processing demands and all receive quick dispatching of all their work. At another extreme and prior to networked systems, users simply choose the particular system they logged onto. Consequently, a large number of active users could utilize a given system(s) thereby causing a significant imbalance between that system(s) and the others, which were then much less loaded. Furthermore, since user assignments were established through pre-defined hardware connections, users could well be connected to systems that were not then available and hence receive no application processing whatsoever, thereby further exacerbating workload and session imbalances among the systems and hence once again resulting in an overall failure of the sysplex to meet its business goals.
A later attempt, commonly referred to as "session placement", provided increased flexibility in terms of balancing workload in view of system failure(s). Session placement relied on assigning and connecting each user, then seeking to establish a terminal session, on a balanced session count basis to the next available system. This user assignment and connection was generally accomplished through some type of network inter-connect facility--such as an IBM Virtual Telecommunications Access Method (VTAM). (VTAM is a registered trademark of the International Business Machines Corporation.) While this approach precluded session assignment to a failed system and thus accorded improved inter-system workload balancing, it still proved deficient. Specifically, the inter-connect facility simply had no knowledge, a priori, of the amount of work any one session entailed or, for that matter, the business importance of that work vis-a-vis other work then queued or executing on the sysplex.
Here too, as with connection based balancing, a system could be overwhelmed by a relatively small number of users with collectively heavy processing demands, thus leading once again to workload and session imbalances.
Moreover, while VTAM maintained knowledge of which systems were available at any one time, each of these systems, simply by virtue of their own processing hardware, could well provide radically different processing capacity than the others: some of these systems might have substantially more processing power relative to others having much less. VTAM had no knowledge of these capacity differences, which could, if recognized and utilized, tend to skew the number of work assignments towards the larger capacity systems. By failing to successfully exploit these capacity differences, workload imbalances were exacerbated in sysplexes having systems of widely differing capacity. In contrast, with connection based balancing, increasingly large systems frequently accommodated correspondingly increased numbers of physical connections and hence users and thus, to a certain extent, successfully exploited these capacity differences.
Unfortunately, session count balancing, as well as certainly connection based balancing, failed to account for the business importance of the various work requests that constituted this workload. Thus, both of these approaches were often unable to meet processing demand in a manner temporally consistent with the current business importance of the underlying application(s) to be processed. For example, by concentrating on maximizing throughput of processed work, no attention was paid, during dispatching, to the relative business importance of the individual work requests, thereby often causing relatively important work to be delayed at the hand of other such work of much lesser importance with a concomitant failure to meet overall business goals.
Given the deficiencies inherent in distributing sessions on a simple balanced session count basis, the art has attempted to remedy these deficiencies by modifying the session count balancing approach to accommodate work request transfers among systems--hereinafter referred to as the "session count balancing with transfer" approach. Specifically, once sessions are assigned to given systems by VTAM, then, in the event of a workload imbalance between systems, heavily loaded system(s) could then transfer individual work requests, on a request-by-request basis, to any other system that then had sufficient idle capacity. Accordingly, if session count balancing resulted in relatively poor session placements, i.e., "bad" choices which caused or exacerbated a current workload imbalance, then, to a certain extent, these bad choices could be subsequently alleviated by subsequent work redistribution among the systems themselves. While at first blush, this appears to be an attractive solution; unfortunately, it can result in significant cost. Specifically, the process of communicating and transferring work requests is heavily dependent on the inter-system communications fabric, incorporated into the sysplex, and available processor resources. Not only must the sysplex contain sufficient communication links, providing high-speed bandwidth to enable such a transfer at any time, but also each such transfer consumes a certain amount of system instructions, expended both at a transmitting system and a receiving system, such as on the order of, e.g., 50K instructions/work request. If the work request is relatively large, then, the resulting processor overhead needed to implement the transfer may be small or even negligible as compared to the processing demands of the work request itself, thereby readily justifying the cost, in terms of system overhead, of the transfer. On the other hand, a work request that consumes a relatively small number of instructions, such as on the order of, e.g., 100K or so, would be simply be too expensive, again in terms of system overhead, to transfer to another system. Unfortunately, rarely, if ever, will a system have a priori knowledge, immediately upon its receipt of a work request, as to just how much processing that request entails, i.e., just how many instructions that request will ultimately consume. Once a system starts processing a request and is then able to possibly estimate its size, it is then simply too late to transfer the request. Thus, given the lack of insight as to the ultimate size of any processing request, the session count balancing with transfer" approach can still produce "bad" choices that result in workload imbalances among the individual systems in the sysplex.
An alternate approach taught in the art for workload balancing, applies in a network context where a network can route a work request from any user to any system in the sysplex. Similar to connection based balancing, this approach involves returning a list of routers, from a network type OS in the sysplex, and then, routing through the network, a current work request to one of these servers. This server is identified in a fixed manner through directories, by the network OS, such as in a round-robin fashion, as the next successive server in the directory or as simply the first server in the directory. Unfortunately, this approach relies on a customer, particularly the sysplex administrator, to define a directory, i.e., an installation table, of all the servers. This table changes whenever a new server is installed or removed. Furthermore and similar to the other approaches described above, the network-based routing process disadvantageously has simply no knowledge of the business importance of the work requests, both those currently executing as well as those that are competing for service, or what other tasks, other than routed work requests, are being executed at each of the available servers and their respective levels of importance. Hence, work requests are frequently assigned and ultimately dispatched totally inconsistent with their actual business importance. Moreover, owing to a lack of knowledge as to actual server loading or availability, a server can be overloaded or taken out of service, but, no information thereof will be immediately passed back to the network routing process to prevent the network from attempting to send a work request to any such then non-available server. As such, whenever a server becomes non-available--because, e.g., it is overloaded or taken out of service, the network is forced to wait for an appropriate response, more likely a lack of response after a given time interval has elapsed, to signify that a given server is not then available. As such, once this time interval has elapsed, the network must then re-route the work request accordingly to the next server listed in the directory. However, this delay disadvantageously postpones both the dispatching and the ultimate processing of this work request--possibly contravening the importance underlying the request and hence causing the sysplex to once again fail to meet its overall business goals.
A recent attempt at allocating system resources to work requests based on attaining one or more pre-defined end-user oriented goals, such as execution velocity or response time, is described in co-pending United States patent application "Apparatus and Method for Managing a Data Processing System Workload According to Two or More Distinct Processing Goal Types", Ser. No. 08/222,755; filed Apr. 4, 1994; assigned to the present assignee hereof and incorporated by reference herein. This attempt represents a significant advance inasmuch as here OS software, rather than a system administrator, takes over the responsibility for allocating system resources in a manner that attempts to satisfy the end-user goals. However, this attempt still relies on a system administrator to assign all the work requests, based on a static workload prediction, to the individual servers in the sysplex and, only after this assignment has been made, allocates the available system resources to attain the goals. As a result of this static workload allocation among the servers, imbalances in workload and/or session placements, as discussed above, can disadvantageously still arise.
Therefore, a need currently exists in the art for a technique, including both a method and accompanying apparatus, that can be used in a multi-system environment, such as illustratively a sysplex or other multi-processing environment, for effectively balancing session placements and/or work requests, across all the servers therein, in view of attendant user-defined business importance thereof and available sysplex resource capacity. By doing so, this technique would be expected to utilize these available resources to balance workload and/or session placements in a manner that properly satisfies the overall business goals of the sysplex. This technique should not merely rely on static predictions of workload and/or session placements but rather should dynamically react and adapt to changing workloads and session requirements, as well as current server availability, and also effectively accommodate capacity differences existing among the various available systems. In addition, by not just relying on static predictions or fixed network based routing schemes, this technique should avoid making any "bad" choices as to session and/or work request placement, thereby obviating the need and cost that might otherwise be incurred to subsequently remedy such choices.