A virtual data center is implemented using entities such as physical hosts and storage devices that are connected to each other via one or more networks. Running on the physical hosts are virtual machines, which execute one or more applications. The performance of these applications depends on the number of the entities supporting the virtual data center and their configurations, as well as the workload on the applications. In order to meet service level objectives (SLOs) of the applications, the virtual machines executing the applications need to be able to handle the changing workload on the applications.
Elasticity is an important feature of virtual data centers. Elasticity refers to the ability to scale in or out virtual machines executing the applications to handle changing workload conditions. Typically, a virtual data center provides scaling based on resource usage thresholds set by a user. These thresholds are static values determined, generally, at the initialization of the virtual data center. The thresholds are commonly based on specific resource usage conditions on the virtual machines (e.g., average processor usage>70%), and are set by the user. However, scaling based on virtual machine resource usage is not sufficient to scale applications with multiple tiers. Additionally, these multi-tier applications often have complicated dependencies, further complicating a determination of which tier and resource (CPU, memory, storage etc.) to scale.
One advanced technique to automatically scale multi-tier applications involves using reinforced learning, e.g., Q-learning, in order to make appropriate recommendations for proper scaling of a multi-tier application. This reinforced learning solves the problem of how to ensure that a multi-tier application operates at a desired performance level, e.g., satisfies SLO, by using positive and negative rewards for actions taken from one state to another state, where each state represents a resource configuration and application performance. The rewards for the different states are typically stored in a table. However, since even a very small change in application performance and resource configuration may create a new state, the number of states to consider may quickly become too large to manage, and thus, the rewards table may become intractable.