Large-scale computing systems, such as those associated with network-based production services, have become widely available in recent years. Examples of such systems include online merchants, internet service providers, online businesses such as photo processing services, corporate networks, cloud computing services, web-based hosting services, etc. These entities may maintain large numbers of computing devices (e.g., thousands of hosts) which are hosted in geographically separate locations and which are configured to process large quantities (e.g., millions) of client requests daily or even hourly. Complex systems may include many services that interact with one another in varied ways.
Tuning the configuration of a service on a computer system is typically a manual process. In other words, a tuning process typically involves a user manually tweaking different attributes of the service or the underlying system in the hopes of improving the performance of the system running the service. Multiple attributes may need to be manually modified many times in an effort to improve the performance significantly, especially if performance is dependent on multiple interrelated attributes. For heterogeneous multi-host web services that have specific targets in terms of throughput, latency, or stability, the tuning process may be especially complex and time-consuming.
A typical approach to this manual tuning process involves trial and error. A user may making some initial guesses on optimal values, test the service based on the guesses, and manually analyze the resulting performance. The user may then tweak the values even further, again based on guesswork. In some circumstances, parts of the system will change dramatically over time, thus making the original estimates outdated. However, because this approach to tuning is manual and time-consuming, the tuning may not be performed on a regular basis. As a result, outdated and inefficient settings may remain in place until they have significantly adverse effects on performance. When performance estimates are outdated or entirely absent, hardware resources may be wasted on systems that are not operating optimally.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning “having the potential to”), rather than the mandatory sense (i.e., meaning “must”). Similarly, the words “include,” “including,” and “includes” mean “including, but not limited to.”