A transaction processing region of a server, such as a Customer Information Control System (CICS®) Transaction Server (TS) for IBM® z/OS®, may receive work from its clients over multiple network connections. These connections can either be short lived or long lived. Connections from a web browser, for example, are generally long lived and may support potentially significant numbers of multiple concurrent requests. The requests from long-lived connections will therefore require management in order to ensure efficient operation, which is the subject of the present disclosure. Request and response messages are serialised over the connection and will typically be in packet form with a header portion and a payload portion. A message will contain request data that the client is passing to the server (or vice versa), which will form the payload or body of the message. The message will also contain metadata, which will form one or more headers of the message, where the role of the metadata is to provide instructions to the recipient of the message on how to handle the request data. Connections may be established when the server component starts up, and may be added or removed as the server component remains active. The server component will have a finite capacity, limited by factors such as the amount of use it can make of the central processing unit (CPU) of the machine it runs on, and the amount of memory it has access to. The system administrator configures server components to maximise the amount of work they can support, without running the risk of stalling because of resource shortages, and takes measures to avoid server components failing altogether from becoming highly loaded. In this context, a stall is a condition that occurs in the server when resource contention impacts the processing of application tasks. A limit is set on the maximum number of tasks that the server component can run concurrently. Typically, this is greater than the number supported over an individual connection into the server component, but less than that for all the connections together, as there will be times when one connection is very busy while others are lightly loaded.
It may sometimes occur that a server component becomes highly loaded, i.e. reaches its full capacity, when too much work arrives over a set of connections. The excess work requests received when the server component is already at full capacity cannot be processed immediately, so are queued in the server component, waiting for a time when they can be processed. However, the queue itself and its management consumes further system resources in the already-overloaded server component. There is currently no mechanism to automatically resolve these network issues, when a server component detects it cannot cope with its current workload.
Controls are known which prevent one network node from flooding a partnered network node with requests, when the partner is unable to process its current workload. For example, it is known to configure the request sender (referred to in this document as the client) so that it only has a fixed number of request slots. For example, a parameter configuring a “Number of Send Sessions” can be used to set a maximum number of concurrent requests that a client can route over a connection. The number is set when the connection is first established and persists for the lifetime of the connection. Another known example is where the service provider (referred to in this document as the server) maintains a queue for requests that have been received, but not yet processed, and, when the queue is full, the server causes any additional requests that it receives to be rejected. These approaches work well for paired systems which have only a single connection between them, as their overall capacity can be calculated in advance and so the capacity of the connection can be set to match. However, large scale systems often have multiple points of entry and so it is not a simple to task to configure their connections in a way which provides for efficient management of these requests.
For example, an IBM CICS® TS for a z/OS production server component is likely to have multiple connections to it over which request messages may arrive. The request traffic rate over any single connection is likely to vary considerably over time. Moreover, the request traffic rate between different connections is also likely to vary considerably over time. It is not practical to configure a server to match the maximum capacity of all of its clients, as this would lead to large amounts of redundancy, as well as wasted resources that are unused unless demand is near peak. Instead, each connection is configured to support more than its fair-share of the server's overall capacity so that during times when the server is less busy a busy client can route a higher rate of requests to the server. Consequently, there may be prolonged periods of time during which requests are queued before they can be serviced by the server, or during which requests are rejected by the server.
To address this issue, it is known for clients to use additional software to discover if they are using a particular connection to its full capacity: IBM z/OS Work Load Manager (WLM) and IBM Tivoli® Netview® are examples of such workload balancing software. Workload load balancing is a technique to distribute TCP/IP-based workload requests (connections) across similar server applications to achieve optimal resource utilization, maximize throughput, minimize response time, and to avoid overloading server applications or systems. By using multiple server applications (instead of a single server application) with load balancing, one can increase reliability through redundancy. The load balancing service usually is provided by a dedicated software program or hardware device. A workload manager provides distribution recommendations to the load balancing service (i.e. a load balancer appliance).
In workload management environments of this type, server applications that access the same data sources typically are included in the same workload. This allows monitoring agents to verify that all of these applications are available and able to handle additional connections on behalf of the workload. It also enables a workload manager to have a consolidated view of the applications in the workload, thereby enabling easy switching of the workloads between geographically separated sites (i.e., different clusters of systems). In particular, the workload manager can signal a load balancer appliance that all connections to the server applications making up the workload are to be switched at the same time to the alternate site. This ensures that access by the applications to the data sources is only done from one site at any point in time. The workload balancing software runs alongside the systems that are using the connection. There is however the disadvantage that such additional software has to be configured separately from the connection it monitors.
Another known approach to deal with high server loading is for the server to send clients data relating to the server's load state, which we refer to as health data. Using the server's load state data, the client can decide whether to send or to delay sending requests to the server, if the server is already busy.