The present invention pertains in general to neural networks, and more particularly to a method and apparatus for improving performance and accuracy in neural networks by utilizing the residual activation in subnetworks.
Neural networks are generally utilized to predict, control and optimize a process. The neural network is generally operable to learn a non-linear model of a system and store the representation of that non-linear model. Therefore, the neural network must first learn the non-linear model in order to optimize/control that system with that non-linear model. In the first stage of building the model, the neural network performs a prediction or forecast function. For example, a neural network could be utilized to predict future behavior of a chemical plant from the past historical data of the process variables. Initially, the network has no knowledge of the model type that is applicable to the chemical plant. However, the neural network xe2x80x9clearnsxe2x80x9d the non-linear model by training the network on historical data of the chemical plant. This training is effected by a number of classic training techniques, such as back propagation, radial basis functions with clustering, non-radial basis functions, nearest-neighbor approximations, etc. After the network is finished leaning on the input data set, some of the historical data of the plant that was purposefully deleted from the training data is then input into the network to determine how accurately it predicts on this new data. If the prediction is accurate, then the network is said to have xe2x80x9cgeneralizedxe2x80x9d on the data. If the generalization level is high, then a high degree of confidence exists that the prediction network has captured useful properties of the plant dynamics.
In order to train the network, historical data is typically provided as a training set, which is a set of patterns that is taken from a time series in the form of a vector, x(t) representing the various input vectors and a vector, y(t) representing the actual outputs as a function of time for t=1, 2, 3 . . . M, where M is the number of training patterns. These inputs could be temperatures, pressures, flow-rates, etc., and the outputs could be yield, impurity levels, variance, etc. The overall goal is to learn this training data and then generalize to new patterns.
With the training set of inputs and outputs, it is then possible to construct a function that is imbedded in the neural network as follows:
{right arrow over (o)}(t)={right arrow over (f)}({right arrow over (x)}(t),{right arrow over (P)})xe2x80x83xe2x80x83(1)
Where o(t) is an output vector and P is a vector or parameters (xe2x80x9cweightsxe2x80x9d) that are variable during the learning stage. The goal is to minimize the Total-Sum-Square-Error function:                               E          ⇀                =                              ∑                          t              =              1                        M                    ⁢                      xe2x80x83                    ⁢                                    (                                                                    y                    ⇀                                    ⁡                                      (                    t                    )                                                  -                                                      o                    ⇀                                    ⁡                                      (                    t                    )                                                              )                        2                                              (        2        )            
The Total-Sum-Square-Error function is minimized by changing the parameters P of the function f. This is done by the back propagation or gradient descent method in the preferred embodiment. This is described in numerous articles, and is well known. Therefore, the neural network is essentially a parameter fitting scheme that can be viewed as a class of statistical algorithms for fitting probability distributions. Alternatively, the neural network can be viewed as a functional approximator that fits the input-output data with a high-dimensional surface. The neural network utilizes a very simple, almost trivial function (typically sigmoids), in a multi-layer nested structure. The general advantages provided by neural networks over other functional approximation techniques is that the associated neural network algorithm accommodates many different systems, neural networks provide a non-linear dependence on parameters, i.e., they generate a non-linear model, they utilize the computer to perform most of the learning, and neural networks perform much better than traditional rule-based expert systems, since rules are generally difficult to discern, or the number of rules or the combination of rules can be overwhelming. However, neural networks do have some disadvantages in that it is somewhat difficult to incorporate constraints or other knowledge about the system into the neural networks, such as thermodynamic pressure/temperature relations, and neural networks do not yield a simple explanation of how they actually solve problems.
In practice, the general disadvantages realized with neural networks are seldom important. When a neural network is used in part for optimizing a system, it is typically done under supervision. In this type of optimization, the neural network as the optimizer makes suggestions on how to change the operating parameters. The operator then makes the final decision of how to change these parameters. Therefore, this type of system usually requires an xe2x80x9cexpertxe2x80x9d at each plant that knows how to change control parameters to make the plant run smoothly. However, this expert often has trouble giving a good reason why he is changing the parameters and the method that he chooses. This kind of expertise is very difficult to incorporate into classical models for rule-based systems, but it is readily learned from historical data by a neural network.
The general problem in developing an accurate prediction is the problem in developing an accurate model. In prediction files, there often exist variables that contain very different frequency components, or have a modulation on top of the slow drift. For example, in electronics, one may have a signal on top of a slowly varying wave of a much lower frequency. As another example, in economics, there is often an underlying slow upward drift accompanied by very fast fluctuating dynamics. In manufacturing, sensors often drift slowly, but the sensory values can change quite quickly. This results in an error in the prediction process. Although this error could be predicted given a sophisticated enough neural network and a sufficient amount of training data on which the model can be built, these are seldom practical neural network systems. As such, this error is typically discarded. This error is generally the type of error that is predictable and should be distinguished from random xe2x80x9cnoisexe2x80x9d that is generally impossible to predict. This predictable error that is discarded in conventional systems is referred to as a xe2x80x9cresidualxe2x80x9d.
In addition to the loss of the residual prediction from the actual prediction, another aspect of the use of a neural network is that of providing optimization/control. Once a prediction has been made, it is then desirable to actually manipulate input variables which are referred to as the control variables, these being independent variables, to manipulate control input parameters to a specific set point. For example, valve positions, tank level-controllers, the accelerator pedal on a car, etc., are all control variables. In contrast, another set of variables referred to as state variables are measured, not manipulated variables, from sensors such as thermometers, flow meters, pressure gauges, speedometers, etc. For example, a control valve on a furnace would constitute the control variable, whereas a thermometer reading would constitute a state variable. If a prediction neural network were built to model a plant process based on these input variables, the same predicted accuracy would be obtained based on either the control variable or the state variable, or a combination of both.
Whenever the network is trained on input patterns, a problem occurs due to the relationship between the control valve and the thermometer reading. The reason for this is that the network will typically learn to pay attention to the temperature or the control or both. If it only pays attention to the temperature, the network""s control answer is of the form xe2x80x9cmake the temperature higherxe2x80x9d or, xe2x80x9cmake the temperature lowerxe2x80x9d. As the thermometer is not a variable that can be manipulated directly, this information has to be related back to information as to how to change the controller. If the relationship between the valve and the temperature reading were a direct relationship, this might be a simple problem. However, the situations that exist in practice are typically more complex in that the state variable dependencies on the control variables are not obvious to discern; they may be multivariant non-linear functions of the controls. In order to build a proper predicted-control model to perform on-line control with no human in the loop, it is necessary for the network to account for the relationship between the control variables and the state variables.
The present invention disclosed and claimed herein comprises a control network for controlling a plant having plant control inputs for receiving control variables, associated plant state variables and one or more controlled plant outputs. Each plant output is a function of dependencies of the plant state variables on the plant control variables. A control input is provided for receiving as network inputs the current plant control variables, the current plant state variables, and a desired plant outputs. A control network output is provided for generating predicted plant control variables corresponding to the desired plant outputs. A processing system processes the received plant control variables and plant state variables through a local inverse representation of the plant that represents the dependencies of the plant output on the plant control variables to provide the predicted plant control variables necessary to achieve the desired plant outputs. An interface device is provided for inputting the predicted plant variables to the plant such that the output of the plant will be the desired outputs.
In another aspect of the present invention, the processing system is comprised of a first intermediate processing system having a first intermediate output to provide a predictive plant output. The first intermediate processing system is operable to receive the plant control variables and state variables from the control network input for processing through a predictive model of the plant to generate a predicted plant output. The predicted plant output is output from the first intermediate output and then to an error device for comparing the predicted plant output to the desired plant output and then generating an error representing the difference therebetween. A second intermediate processing system is provided for processing the error through a local inverse representation of the plant that represents the dependencies of the plant output on the plant control variables to provide the predicted plant control variables necessary to achieve the desired plant outputs.
In a further aspect of the present invention, the processing system is comprised of a residual activation neural network and a main neural network. The residual activation neural network is operable to receive the plant control variables and the state variables and generate residual states that estimate the external variances that affect plant operation. The residual activation neural network comprises a neural network having an input layer for receiving the plant control variables, an output layer for providing predicted state variables as a function of the control inputs and a hidden layer for mapping the input layer to the output layer through a representation of the dependency of the plant control variables on the state variables. A residual layer is provided for generating the difference between the predicted state variable and the actual plant state variables, this constituting a residual. The main neural network is comprised of a hidden layer for receiving the plant control variables and the residual, and an output layer for providing a predicted plant output. The main neural network has a hidden layer for mapping the input layer to the output layer with a representation of the plant output as a function of the control inputs and the residual. The main neural network is operable in an inverse mode to provide the local inverse representation of the plant with the dependencies of the control variables and the state variables projected out by the residual activation network.