Technical Field
Embodiments generally relate to workload management, and more specifically to predictively scaling a number of instances of an application executing within a cloud computing environment in anticipation of the application's future workload.
Description of Related Art
Efficient resource allocation is a constant challenge for modern cloud computing environments. For example, a particular application may require 100 application instances in order to process its peak workload, but could require only 40 application instances for processing its average workload. In this example, the application could be configured to operate using 40 application instances, but such a configuration would fail to accommodate the application's peak workload. Likewise, the application could be configured to operate using 100 application instances all of the time, but such a configuration would lead to an efficient use of resources, as application instances may sit idle or are underutilized during times of non-peak workload. As such, cloud solutions may attempt to reactively scale the number of application instances in order to meet the fluctuating workload. That is, logic could determine when the application resources are sitting idle or when the application is unable to keep up with its current workload, and could scale the number of application instances down or up accordingly. However, as the application's workload is almost certainly dynamic in practice, it can be challenging to scale the number of application instances in order to accommodate the increasing or decreasing workload.
As an additional challenge, the start-up process of many applications may require a substantial amount of time. Thus, if the additional application instances are not created until the system detects the application is underperforming relative to the current workload, and the application instances take a substantial amount of time to initialize, the application may fall further behind relative to its current workload while waiting on the application instances to initialize. Such a scenario can lead to further underperformance by the application, and in some circumstances can lead to the system reactively spawning even more additional application instances in order to address the underperformance, resulting in an excessive number of application instances. Moreover, in the event such reactive scaling techniques are not fast enough to meet the increasing workload, the system may fail to catch up altogether, potentially resulting in interruptions, errors or even a complete system failure.