In general, increases in an application's popularity could present a variety of scalability problems that negatively impact a user's experience. For example, users could experience slower response times, slower page loading, and increased time outs on page requests. These scalability problems are typically alleviated by allocating additional capacity to the application such as more storage, more memory, more CPUs, and more machines in general.
Allocating or installing more computing capacity may be a reasonable solution when increases in an application's popularity are experienced over a prolonged period of time, or when usage of the application is predictable. Similarly, when an application experiences a decrease in usage, removing computing capacity previously allocated to the application may be a reasonable solution, especially when this is experienced over a prolonged period of time, or when the decrease is predictable. However, the popularity of an application is often unpredictable, due to a variety of factors (e.g., time of day, current events, advertising, trends, etc.), and fluctuates to a large extent, which creates load spikes and dips in the application execution or hosting system.
Predefined allocations of computing resources are inefficient solutions for handling temporary load spikes and dips. Increasing or installing more computing resources to handle a load spike is inefficient, since the additional pre-allocated resources go unused when the spike disappears (e.g., when the spike in demand subsides, or the application's popularity dips). Similarly, decreasing computing resources allocated to an application when its popularity declines is also inefficient, since future usage spikes will require the re-allocation of previously removed resources back to the application.
To complicate matters further, application systems may host a large number of heterogeneous applications, each with its own set of fluctuating resource requirements. Pre-allocation of resources, for the reasons discussed above, is often an inefficient solution for ensuring consistent positive user experiences among heterogeneous applications hosted on an application system.
Furthermore, long running applications may need to be moved from one server to another due to fluctuating demand, machine fatigue, and/or other factors. Moving an application from one server to another can cause errors if the application is moved at an inopportune time. For example, it may not be advantageous to move an application when an answer has not yet been returned for an outstanding API call. Determining a good time to move an application from a first server to a second server in order to resume the application in the same state in which it was checkpointed is thus desirable.