Various methods are known from the prior art with which, on the basis of previously determined training data representing operation of a technical system, optimal operation of this system may be modeled. The technical system is described by states, actions and follow-on states, the states being certain technical parameters or observed status variables of the technical system, and the actions representing corresponding manipulated variables, which can be varied in the technical system. Generally, reinforcement learning processes (Reinforcement Learning) are known from the prior art which, for a technical system based on training data, learn an optimal action-selecting rule according to an optimality criterion. The known methods have the drawback that they do not provide any statements with regard to the statistical uncertainty of a learned action-selecting rule. Such uncertainties are very high in the case of a small quantity of training data in particular.
Document [1] describes a method which takes account of the statistical uncertainty of a quality function used to learn an action-selecting rule. A learning method for determining an action-selecting rule is combined with a statistical uncertainty, wherein, based on uncertainty propagation known per se, which is also called Gaussian error propagation, a measure of the statistical uncertainty of a quality function taken into account during learning is determined. The uncertainty propagation takes account of correlations between uncertainties in the variables that enter into the learning method by means of a covariance matrix. The uncertainty in the variables is therefore exactly propagated and calculated, and this leads to a very high computational effort and memory space requirement in the case of computer-assisted learning of appropriate control of a technical system.