Computing clouds allow many applications to share the same underlying set of physical computing resources. Many computing clouds have resources in locations across the globe. Furthermore, many applications hosted in computing clouds now have global user bases. Increasing geographic dispersion of a cloud's resources and of the persons or clients that access a distributed application hosted by a cloud has led to sub-optimal provisioning of cloud resources to applications.
Consider that most clouds allow instances of an application to execute at many different cloud datacenters at any given time. Moreover, requests from clients to use the distributed application may originate at any time and from any location having network connectivity. Which application instances will service which client requests can vary. However, performance generally benefits from maximizing the proximity between clients and cloud-hosted instances of the application. Although network proximity and physical proximity are not necessarily equivalent, in most cases, the geographically closer a client is to an application instance, the better the performance. The geographic boundaries that a client-application session or exchange must traverse are often proxies for performance-impacting network boundaries.
As only appreciated by the inventors, merely dynamically adjusting the number of instances of an application to respond to current load is not an ideal solution. For one, responsive scaling inevitably involves lag between the computation load (e.g., number of client requests being serviced) and improved responsiveness. By the time an instance has been added to meet current demand, many clients may have already experienced degraded performance. Conversely, when load is dropping, responsive scaling may cause a cloud customer to have unneeded instances, thus increasing their cost with no commensurate benefit. Furthermore, merely scaling based on proximity may be insufficient and also requires possibly substantial resources to implement a continuous feedback-control loop. Geographic boundaries can coincide with legal constraints such as data privacy requirements; shifting an instance of a distributed application to a closer cluster of clients might increase overall client-application proximity but at the same time violate data privacy or location rules. Moreover, some prior solutions have scaled instances according to frequencies or rates of client requests. This approach may fail to align with performance as seen by requesting clients.
Techniques related to predictive scaling of instances of cloud applications based on telemetry observations of time, location, and performance of client/user requests are discussed below.