The present embodiments relate to computer-assisted open-loop and/or closed-loop control of a technical system.
Various computer-assisted methods for performing open-loop and/or closed-loop control of technical systems are known from the prior art. These methods specify, for a current state of the technical system, which action is to be carried out on the technical system. A state is described here by a number of state variables, and an action is described by a number of action variables. In addition to simple table-based regulators that assign corresponding actions to states of the technical system by a table, there are also regulators having an action selection rule that has been learnt with a machine learning method (e.g., based on a recurrent neural network). An application case of such regulators is gas turbines in order to optimize parameters of the turbine such as the efficiency, combustion chamber dynamics, emissions of pollutants. A further application case of these regulators is the control of a wind turbine. In this case, for example, the wear and the efficiency are optimized.
In order to implement machine-learnt action selection rules, training data is provided. The training data specifies corresponding subsequent states for a number of states and actions that are carried out in these states. In order to generate new training data, the technical system is to be operated in still unknown states. However, these new states are to not disrupt or severely impair the operation of the technical system and, for example, do not bring about malfunctions of the technical system.