When controlling and/or regulating technical systems, it may be desirable to influence the operation of the technical system by carrying out corresponding actions in such a manner that the behavior of the technical system is optimized with respect to particular criteria. For example, when operating a gas turbine, it is useful to reduce the exhaust gas emissions produced by the turbine and to keep the combustion dynamics of the combustion chamber of the gas turbine (also referred to as combustion chamber humming) as low as possible. In this case, it is possible to influence, for example, parameters relating to the supply of gas and air to the combustion chamber of the gas turbine.
Computer-assisted methods used to determine an action selection rule are disclosed, according to which actions are determined for corresponding successive states of the technical system characterized by suitable state variables of the system, which actions are optimal with respect to an optimization criterion, for example the above-mentioned low pollutant emission and low combustion chamber humming. DE 10 2007 001 025 A1 and DE 10 2008 020 379 A1 describe the determination of an action selection rule on the basis of the training of a recurrent neural network with training data including known states and actions. According to the action selection rule, an action sequence is output for a current state of the technical system taking into account past states on the basis of an optimization criterion.
The known methods for determining an action selection rule using recurrent neural networks have the disadvantage that the optimization criterion in the form of a measure of quality is concomitantly included when training the recurrent neural network. Consequently, it is not possible to readily react to a changing optimization criterion during real operation of the technical system since the neural network may have to be completely retrained for this purpose.