The present invention relates generally to the management or control of systems or plants, and relates more particularly to the automatic development of management policies using reward-based learning.
In many application domains, such as distributed computing systems, the development of management policies typically entails the development of explicit models of system behavior (e.g., based on queuing theory or control theory) and interactions with external components or processes (e.g., users submitting jobs to the system). A common problem with such an approach is that devising the necessary models is often a knowledge- and labor-intensive, as well as time consuming, task. Hence, there is a great need for adaptive machine learning methods (e.g., reward-based learning methods) to automatically develop effective management policies, thereby avoiding extensive and time-consuming engineering of explicit domain knowledge.
However, a frequent and common limitation of such machine learning methods is that learning becomes progressively more difficult as the complexity of the managed system increases. For example, the complexity of learning may commonly increase exponentially with the total number of input dimensions (i.e., the number of input variables plus the number of control variables; this is sometimes referred to as the “Curse of Dimensionality”).
Thus, there is a need for an improved method for reward-based learning that addresses the curse of dimensionality which limits existing methods.