Large-scale computing systems, such as those associated with network-based production services, have become widely available in recent years. Examples of such systems include online merchants, internet service providers, online businesses such as photo processing services, corporate networks, cloud computing services, web-based hosting services, etc. These entities may maintain large numbers of computing devices (e.g., thousands of hosts) which are hosted in geographically separate locations and which are configured to process large quantities (e.g., millions) of client requests daily or even hourly. Ensuring that these services can scale to handle abnormally high loads is a non-trivial problem. For example, instead of testing an actual production system (i.e., a system that is currently open to real-world clients), software engineers often create a scaled-down copy of a production system with a smaller number of hosts and test the scaled-down system with a smaller, proportional load. However, such an approach may fail to detect problems that would only surface on the larger scale of the production system.
When testing whether a production system can scale to handle a large load, software engineers are often in a situation where a single-host load generator is not able to generate sufficient load to provide an adequate test. In such a scenario, the engineers may resort to running the single-host load generator concurrently on multiple hosts. One drawback to this approach is the requirement of managing the individual load generators to ensure that they are jointly applying the desired amount of load. Such a task may require nearly constant manual oversight and re-adjustment. There is no guarantee that every individual load generator will provide the same maximum load, even if all the load generators have the same hardware and software configuration. Additionally, if individual load generators experience hardware failures, the overall system will not reach its target load.
The task of providing a synchronized, controlled load using multiple hosts is often tackled with complex architectures involving a centralized database and a master/server paradigm. In such an approach, the master typically knows which slaves are able to generate load, and the master distributes the load accordingly. During execution of the load test, the master must be in periodic contact with the slaves to decide if they are working or not, to calculate metrics such as overall TPS (transactions per second), and to determine whether it needs to ask individual hosts to increase their load to reach the target load. Accordingly, such master/slave systems require a high degree of coupling between the components. Achieving such a high degree of coupling often proves expensive for the various components and for the network interconnections. Additionally, if a slave temporarily loses its network connection to the server, the server may conclude that the slave is dead and reassign the slave's load to another component. However, if it turns out the slave was alive and applying load, a greater amount of load than expected may be applied, potentially to the detriment of the service under test.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning “having the potential to”), rather than the mandatory sense (i.e., meaning “must”). Similarly, the words “include,” “including,” and “includes” mean “including, but not limited to.”