The present invention relates to a control system for a control subject having a combustion unit.
A control logic based on PID (proportional-integral-derivative) control has long been the mainstream in the field of plant control. A large number of technologies capable of flexibly deal with characteristics of a plant have been proposed using a supervised learning function represented by a neural network.
In order to construct a control system by using a supervised learning function, successful models of teacher data may be prepared, and thus an unsupervised learning method also has been proposed.
A reinforcement learning method is known as an example of the unsupervised learning.
The reinforcement learning implies a framework of a learning control to generate an operation signal to the environment through interactive operations in trial and error with the environment such as a control subject such that desirable measurement signals may be obtained from the environment. Thus, when successful models are not prepared, if a desirable state is defined, the control system can learn desirable actions in response to the environment.
The reinforcement learning involves a learning function to generate an operation signal to the environment based on an evaluation value (called a “reward” in the reinforcement learning) of a scalar amount computed by using measurement signals obtained from the environment, so that expected values of evaluation values obtained between the present state and the future state may become maximum. A Non-Patent Document 1 discloses algorithms such as Actor-Critic, Q-learning and real time Dynamic Programming as a method of incorporating such learning function.
As a framework of reinforcement learning obtained by having developed the above-described method, a framework called “Dyna-architecture” has been introduced into the above-described Non-Patent Document 1. The control system method learns in advance a suitable operation signal to be generated by using a model simulating the control subject as a target and determines an operation signal applied to the control subject by using this learning result. Also, this method has a model adjustment function to decrease an error between the control subject and the model.
A plant control system including a combustion apparatus encounters with problems in which combustion characteristics and heat transfer characteristics of the plant are changed when fuel properties are not constant like coal fuels or when coal types are changed. In order to solve these problems, a technology described in the Patent Document 1 may be given.
The technology implies a method of operating a fuel heating value ratio from a deviation between a real measurement signal and a setting value of a main steam pressure in a coal fired boiler.
In addition, a Patent Document 2 has described a control system involving a first estimation unit for calculating a furnace absorbing heating value estimation value that is estimated based on a fluid measurement data on a temperature, pressure, flow rate, and the like in a furnace of a coal fired boiler; a second estimation unit for calculating a final re-combustion device absorbing heating value estimation value that is estimated based on a temperature, pressure, flow rate, and the like of the final re-combustion device; a unit for calculating a ratio between the furnace absorbing heating value estimation value calculated by the first estimation unit and the final re-combustion device absorbing heating value estimation value calculated by the second estimation unit; and an operation unit for grasping boiler combustion characteristics based on the ratio of the absorbing heating value estimation value that is calculated by this unit and outputting a gas distribution damper setting value and a revolution rate setting value and a boiler input acceleration setting value of a gas recirculating ventilator.
In the field of controlling a plant such as a boiler, a control logic which is based on PID control has been a mainstream. Owing to the supervised learning function represented by the neural network, a large number of technologies which can cope with characteristics of the plant with flexibility have been proposed. Then, in order to construct a control system by using this supervised learning function, successful models serving as supervising data have to be prepared in advance. Therefore, the unsupervised learning method such as reinforcement learning method has been proposed.
The reinforcement learning is a learning control framework to generate an operation signal to the environment through interactive operation in trial and error with the environment such as a control subject such that a measurement signal obtained from the environment may become a desirable measurement signal. Thus, even when successful models are not prepared in advance, if only a desirable state is defined, there is an advantage that the control system can learn desirable actions in response to the environment.
Then, in this reinforcement learning method, the operation signal to the environment signal is generated in such a manner that expected values of evaluation values from the present state to the future state may be maximized based on an evaluation value calculated by using the measurement signal obtained from the environment. As a method of incorporating such learning function, there are known algorithms such as Actor-Critic, Q-learning and real time Dynamic Programming.
As a framework of reinforcement learning which developed the above-mentioned method, there is known a framework called “Dyna-architecture”. This method learns in advance a suitable operation signal to be generated by using a model simulating the control subject as a target and determines an operation signal applied to the control subject by using this learning result. At that time, this method has a model adjustment function to decrease an error between the control subject and the model.
On the other hand, as numerical value analysis technologies are advanced, combustion reaction can be reproduced to some extent by calculation so that a model can be constructed by using a simulator simulating the plant (see Patent Document 3, for example).    [Patent Document 1]: Japanese Unexamined Patent Publication No. 2004-190913    [Patent Document 2]: Japanese Unexamined Patent Publication No. Heisei 8-200604    [Patent Document 3]: Japanese Unexamined Patent Publication No. 2003-281462    [Non-Patent Document 1]: “Reinforcement Learning”, translated jointly by Sadayoshi Mikami and Masaaki Minagawa, published by MORIKITA Publishing Co., Ltd., Dec. 20, 2000