1. Field of the Invention
The present invention relates to workload distribution in network computing and more particularly to backend workload management by controlling the request rate of clients.
2. Description of the Related Art
In most, if not all, new and existing web applications, performance and capacity are difficult to predict as the source request rate is either unknown or poorly estimated. In large scale public web applications, the request rate can be high, variable and unceasing. Traditionally, load testing of the application attempts to prove that the system can cope with a certain predicted load; however such a synthetic load is generated in a very conventional manner, from a bell curve, the peak of which equates to a best guess maximum Transactions Per Second (TPS) and is being served by a perfectly operating application system. In contrast, primary and dependent systems will degrade or fail in subtle ways and transaction rates vary wildly. As such, a fundamental difference exists between the synthetic TPS rating of a system in test and the real load a system will have to face in the field.
The queue based nature of many sophisticated web applications, with synchronous calls from the application server to multiple backend systems, each backend system having its own varying response times and performance profiles, coupled with high and variable client request rates can lead to very fast failures in the event one of the components is failing and “browning out”. Additionally these situations are difficult to diagnose and can give end users nothing more than a blank screen as feedback. In other words, there is no feedback about the health of the backend system to the client.
In addition, traditional load balancing does not address the fundamental request transactions per second (TPS) that the “system” has to handle. Accordingly, modern load balancers cannot properly manage the flow of requests generated by the clients.