A plant control system processes the measurement signal data obtained from a plant, which is a control target, calculates the operation signals to be given to the plant, and transmits the calculated operation signals. The plant control system includes an algorithm for calculating the operation signals so that the measurement signal data from the plant satisfies their target values.
Control algorithms used for plant control include a proportional integration (PI) control algorithm. In PI control, a deviation between the measurement signal data obtained from the plant and its target value is multiplied by a proportional gain, and a value obtained by integrating the deviation with time is added to the value obtained by the multiplication, so as to derive an operation signal to be given to the control target.
In the control algorithm in which PI control is used, relationships between inputs and outputs can be represented by using block diagrams, clarifying the cause-and-effect relations between them. Therefore, this type of control algorithm has been widely applied. However, a plant may be operated under a condition which is not scheduled in advance when, for example, the method of operating the plant is changed or the environment around the plant is changed. When this happens, tasks such as changing the control logic may be needed.
Adaptive control is available in which the control algorithm or parameter values are automatically corrected according to the changes in the plant operation method and environment. Control methods using a learning algorithm are also available. Patent Document 1 describes a technology concerning a control system that uses a reinforcement learning theory as a method of deriving operation signals for a control system that uses a learning algorithm to control a plant. In this method, the control system has a model for predicting characteristics of its control target and a learning part for learning a method of operating a model input so that a model output attains its target value. Since the model input learnt in the learning part is input to the model, the effect of bringing the model output to its target value is obtained.
Since, in this learning type of adaptive control, an operation method by which the model output attains its target value is learnt, an evaluation function value, which represents an attainment degree of the target value, is calculated according to the value of the model output obtained as a result of an operation; the learning part learns an operation method by using the evaluation function value as an index.
When a learning control system is constructed, the design of this type of evaluation function is generally entrusted to a system designer. The system designer must design an appropriate evaluation function in view of control specifications such as a control target value and a learning time as well as the characteristics of the control target.
Patent Document 2 discloses a control unit, based on reinforcement learning, which is robust to environmental changes because evaluation signal data is defined by giving a consideration for a disturbance of a control system, which is generated by a disturbance generator, to an ordinary compensation signal depending on the attainment degree of the target, and learning is carried out so that its expected value is maximized.
Non-patent Document 1 describes a technology concerning the design of an evaluation function appropriate to a learning control system; an evaluation function (compensation) designed so that a desirable behavior is obtained is given to a learning mechanism based on the reinforcement learning theory, according to the progress of learning, enabling efficient learning.    Patent Document 1: Japanese Patent Laid-open No. 2000-35956    Patent Document 2: Japanese Patent Laid-open No. 2002-189502    Non-patent Document 1: Yamanashi, Motoyama, Urakawa, Oh, Yabuta, “Advance Motion Acquisition of an Actual Robot by Reinforcement Learning using Reward Change”, Transactions of the Japan Society of Mechanical Engineers (C), Vol. 72, No. 717, pp. 1574-1581, 2006.