In a message processing system comprised of a plurality of stages, one stage interacts with another stage by communicating work, such as messages, in a work flow to the subsequent stage. The effectiveness of this work flow is important to, and in effect determines, the overall operability of the system. Each stage, or component, which may also be thought of as a subsystem of the overall system, has a capacity to perform work and is responsible for and capable of receiving the work from the previous stage, processing the work, and transmitting work to the next stage. The completion of such tasks consumes various resources at each stage; and the consumption of the resources has a direct effect on the functionality of the system, as the processing capacity of each stage is a function of its resources. Often the present capacity of a subsystem is not easily measurable and the exact resource requirement for performing an individual task is not predictable. In addition, a subsystem may concurrently process a heterogeneous workload, thus making the quantification of the subsystem capacity and prediction of the resource consumption difficult. In order for the system to function at an optimal level, each stage of the system must optimally use its resources. This is because each stage contributes to the overall system. For example, if all stages of a message processing system are using their resources at an optimal level, the overall system will achieve maximum throughput, or production of the system in a given time. If, however, just one stage of the system is not appropriately utilizing its resources, the system will experience either an accrual of work, in response to the over-utilized stage, or the system will be under-worked as more work could be handled or processed by the system.
Various factors contribute to the utilization, and hence to the overall operation, of a message processing system. One such factor is the arrival rate of work to the system. When the work arrival rate is very high, the system, or one or more stages, is expending its resources on the arrival and cannot attend to handling the actual work. One particular stage may be overburdened with incoming messages or work, while the next stage is idle, waiting to receive the work load. The utilization of one stage causes a ripple or cascading effect on all of the succeeding stages. When the system is in this state, i.e., when the incoming rate of work is too much for it to handle, the system is thrashing. As too much work for the system is received, a number of problems ensue. Errors may result; the type of work the system is handling, such as messages or data, may be lost; or the system may become unresponsive or slow.
Throttling techniques are employed in message processing systems to prevent the system from thrashing. A throttled system controls the work flow of the system so that the system is only handling the amount it can properly process and is therefore neither under-utilized nor over-utilized. A message processing system that utilizes throttling techniques conducts an examination of the system throughout operation to determine when the system is not operating in an optimal state. Such a situation is known as a stress situation. When a stress situation is discovered, throttling actions are taken to mitigate the problem.
Throttling techniques are presently utilized in many types of message processing systems and similar systems in which work is processed and transferred between system components and wherein resources are consumed in performing such operations. However, various problems exist with such presently-used techniques. One such drawback is that the detection of stress situations is dependent upon the user of the system. The user is responsible for determining the threshold values for the various resources, but, often times, the user does not have the data necessary to do so, and the data is very dynamic in nature. This is particularly so when the message processing system comprises a series of stages, where effective throttling would require information about the capacity and resource utilization of the individual stages. Furthermore, when a stress situation is detected and necessary measures are employed to alleviate the problem, the system stays in this mode and does not re-evaluate to determine if, at a later time, the system can handle an increased workload.
Accordingly, in light of the above considerations and limitations, improved systems and methods for controlling the flow of work in a distributed message processing system comprising a plurality of stages are highly desired. The present invention satisfies this need.