When controlling and/or regulating technical systems, it is often desirable to influence the operation of the technical system by carrying out corresponding actions in such a manner that the behavior of the technical system is optimized with respect to particular criteria. For example, when operating a gas turbine, it is useful to reduce the exhaust gas emissions produced by the turbine and to keep the combustion dynamics of the combustion chamber of the gas turbine (also referred to as combustion chamber humming) as low as possible. In this case, it is possible to influence, for example, parameters relating to the supply of gas to the combustion chamber of the gas turbine.
The prior art discloses computer-assisted methods which are used to determine an action selection rule, according to which actions are determined for corresponding successive states of the technical system which are characterized by suitable state variables of the system, which actions are optimal with respect to an optimization criterion, for example the above-mentioned low pollutant emission and low combustion chamber humming. Documents [1] DE 10 2007 001 025 A1 and [2] DE 10 2008 020 379 A1 describe the determination of an action selection rule on the basis of the learning of a recurrent neural network with training data comprising known states and actions. According to the action selection rule, an action sequence is output for a current state of the technical system taking into account past states on the basis of an optimality criterion. The action sequence can be determined in a short computing time during real operation of the technical system. However, in this case, it is not always ensured that the actions determined according to the action selection rule are optimal in the sense of the optimality criterion. Discrepancies may occur, in particular, when the states of the technical system, for which the action sequence is determined, are in operating ranges which are far away from the training data.