Performance management of a computing system is one example of a feedback control system. Most performance management systems make resource management and allocation decisions based on feedback of performance metrics. More particularly, a resource manager component of the performance management system submits a resource action to the computing system being managed (the managed system), in response to the performance metric feedback information. The resource action is typically intended to affect the performance of the managed system.
Herein, we consider resource actions where there is some a-priori unknown delay between the time that the resource action is submitted by the resource manager and the time that it takes effect in the managed system. Such a delay is typically caused by queuing or buffering in the managed computing systems. As a result, the effect of the resource action is not visible in the feedback metrics for an unknown period of time.
Such unknown or unmodeled delays can be dangerous from the point of view of a feedback-based control system. In particular, well-known linear control theory suggests that adding delays to a feedback loop can make the system unstable. Intuitively, this happens because the controller (in particular, the resource manager) may over-react because the effects of its control actions may not show up in the feedback metric in a timely fashion. Such over-reactions can, in the best case, lead to a poorly-performing controller which is not able to meet its performance objectives effectively. For example, the performance metric (e.g., response time) may oscillate over a wide range around the desired value rather than being stable at the desired value. In the worst-case, the closed-loop system may be unstable, which typically results in the well-known limit cycle behavior. Here, the resource actions oscillate between the maximum and minimum resource setting, and the performance metric shows a very undesirable thrashing behavior.
When delays can be known or modeled, designers of control (or resource management) algorithms can redesign the controller so as to take such delays into account. However, when faced with unknown delays, designers must make their systems slow enough so that the effects of delay are not visible. For example, if the control delays are in the seconds range, one may choose to change the resource settings only once every few minutes, thus making the delay a negligible aspect. One limitation of this approach is that by limiting the control frequency, one loses the ability to adapt quickly to changing scenarios. Further, it is possible to encounter situations where the actual delay exceeds the values for which the resource manager is designed.