A key requirement of modern software-defined networking (SDN) is the ability to scale system resources on-demand. Applications in a data center should be scaled out (e.g., new instances of applications are added) as load reaches operating capacity, and scaled in (e.g., instances of applications in execution are removed or terminated) when there is not enough load. Generally speaking, automatic scaling of resources (also referred to as autoscaling) involves using the optimal number of resources to handle the load while meeting the Service Level Agreements (SLAs). Traditional autoscaling techniques typically measure server capacity and scale out application instances when a server reaches capacity. The capacity is typically measured in terms of resources like central processing unit (CPU) usage, memory usage, or response time. The system administrator can specify certain autoscaling policies, e.g., response time must be less than 500 ms (the policy limit), and application instances are scaled out if the response time exceeds the policy limit. However, it is quite possible that saturation may have happened even though the response time has not yet increased to 500 ms. The system may experience errors before the response time increases beyond 500 ms. In a different scenario, server response time may have been linearly increasing to 600 ms without being flagged on the saturation curve. At some point, the SLA would drop unless the number of servers is increased, as any new requests may be dropped or improperly serviced due to errors. Thus, existing autoscaling techniques have certain limitations, such as inaccurate metrics, lack of dynamic scaling during operation, overly optimistic assessment of servers even as they reach saturation, and lag time in load conditions, etc. A more accurate autoscaling technique that can take into account load conditions dynamically is needed.