Various methods are known from the prior art which, on the basis of previously determined training data representing the operation of a technical system, can be used to model an optimal operation of said system. In this case the technical system is described by states, actions and subsequent states, the states being specified technical parameters or observed state variables of the technical system, and the actions representing corresponding manipulated variables which can be varied in the technical system. General reinforcement learning methods which learn an optimal action selection rule for a technical system on the basis of training data in accordance with an optimality criterion are known from the prior art. The known methods have the disadvantage that they do not provide any information relating to the statistical uncertainty of a learned action selection rule. Such uncertainties are very significant, particularly if the quantity of training data is small.