The present invention relates generally to data processing systems, and relates more particularly to autonomic computing (i.e., automated management of hardware and software components of data processing systems). Specifically, the present invention provides a method and apparatus for reward-based learning of improved systems management policies.
Due to the increasing complexity of modern computing systems and of interactions of such systems over networks, there is an urgent need to enable such systems to rapidly and effectively perform self-management functions (e.g., self-configuration, self-optimization, self-healing or self-protection) responsive to rapidly changing conditions and/or circumstances. This entails the development of effective policies pertaining to, for example, dynamic allocation of computational resources, performance tuning of system control parameters, dynamic configuration management, automatic repair or remediation of system faults and actions to mitigate or avoid observed or predicted malicious attacks or cascading system failures.
Devising such policies typically entails the development of explicit models of system behavior (e.g., based on queuing theory or control theory) and interactions with external components or processes (e.g., users submitting jobs to the system). Given such a model, an analysis is performed that predicts the consequences of various potential management actions on future system behavior and interactions and then selects the action resulting in the best predicted behavior. A common problem with such an approach is that devising the necessary models is often a knowledge- and labor-intensive, as well as time consuming, task. These drawbacks are magnified as the systems become more complex. Moreover, the models are imperfect, so the policies derived therefrom are also imperfect to some degree and can be improved.
Thus, there is a need in the art for a method and apparatus for reward-based learning of improved systems management policies.