The present invention relates to a control apparatus and a control method suitable for controlling a thermal electric power plant or the like.
In recent years, unsupervised learning methods such as a reinforcement learning method have extensively been researched. “Reinforcement learning” is known as a framework of “learning to control” that provides a method of learning to generate operation signals for working on an environment such that measurement signals obtained from the environment will be desirable via an interactive operation in trial and error with an environment such as a control subject.
In reinforcement learning, a method of generating operation signals relative to an environment is learned such that the expected value of the evaluated values obtained between the present state and the future state is the highest or lowest value based on scalar quantity (called a “reward” in the reinforcement learning field) calculated from measured signals obtained from the environment. As examples of a method of implementing the learning function, algorithms such as Actor-Critic, Q-learning, and Real-Time Dynamic Programming described in the Non-Patent Document 1 have been known.
As a framework of a further elaborated reinforcement learning method, “Dyna-architecture” is reviewed in the above-described literature. The framework involves preliminarily learning what operation signals to generate based on a model that simulates a control subject, and determining which operation signals to apply to the control subject based on the learned result. The framework also includes means for adjusting a model using the operation signals for the control subject and the measured signals such that the occurrence of an error between the control subject and the model can be reduced.
Further, a technology to which the reinforcement learning is applied is disclosed in the Patent Document 1. This technology includes a method of determining which operation signals to apply to the control subject by the following steps:
preparing a plurality of reinforcement learning modules each including a model and a system having a learning function;
calculating responsibility signals each having a value such that the smaller the prediction error between the model and the control subject, the greater value the module may include; and
weighting operation signals in proportion to the responsibility signals for the control subject generated from each of the reinforcement learning modules.
A plant control apparatus computes measured signals obtained from a plant of a control subject to figure out the operation signals for applying to the control subject. The control apparatus incorporates algorithms to compute the operation signals such that the measured signals of the plant can achieve the operation target.
As an example of a control algorithm used for controlling the plant, a PI (proportion-integral) control algorithm can be given. In the PI control, the operation signals to output from the control apparatus for controlling the plant may be figured out by adding a value obtained from time-integrating a deviation between an operation setpoint value and the measured signals of the plant to a value obtained from multiplying the deviation between the operation setpoint value and the measured signals of the plant by a proportional gain. Alternatively, the operation signals for controlling the plant in the control apparatus may be obtained using the learning algorithms.
Japanese Unexamined Patent Publication No. 2000-35956 describes a technology regarding an agent learning apparatus as a method of computing the operation signals for controlling the plant in the control apparatus using a learning algorithm.
A technology regarding a method using Dyna-architecture is described in a technical literature “Reinforcement Learning” (from pp. 247 to 253).
In the methods according to these technologies, since a control apparatus includes a model for predicting characteristics of a control subject and a learning unit for preliminary learning to generate a model input such that a model output as a predicted outcome of the model can achieve a model output target, the control apparatus can generate operation signals supplied to the control subject in accordance with the learned result by the learning unit.
If there is an error between the model and the control characteristics of the control subject, the control apparatus corrects the model using the measured signals obtained from the outcome of operating the control subject and re-learns which operation signals to generate based on the corrected model.    [Non-Patent Document 1] “Reinforcement Learning”, translated by Sadayoshi Mikami and Masaaki Minagawa, published by Morikita Publishing Co., Ltd. on Dec. 20, 2000.    [Patent Document 1] Japanese Unexamined Patent Publication NO. 2000-35956
In the methods according to these technologies, since a control apparatus includes a model for predicting characteristics of a control subject and a learning unit for preliminary learning to generate a model input such that a model output as a predicted outcome of the model can achieve a setpoint of a model output, the control apparatus can generate operation signals supplied to the control subject in accordance with the result acquired by the learning unit.
Further, if there is a significant difference in the characteristics between the control subject and the model, the operation signals that is effective to the model may not necessarily be effective to the control subject. Hence, the control subject may not appropriately be controlled.