1. Field of the Invention
The present invention generally relates to the field of predictive modeling and control, and more particularly to a combined modeling architecture for building numerically efficient dynamic models for systems of arbitrary complexity.
2. Description of the Related Art
Many systems or processes in science, engineering, and business are characterized by the fact that many different inter-related parameters contribute to the behavior of the system or process. It is often desirable to determine values or ranges of values for some or all of these parameters which correspond to beneficial behavior patterns of the system or process, such as productivity, profitability, efficiency, etc. However, the complexity of most real world systems generally precludes the possibility of arriving at such solutions analytically, i.e., in closed form. Therefore, many analysts have turned to predictive models and optimization techniques to characterize and derive solutions for these complex systems or processes.
Predictive models generally refer to any representation of a system or process which receives input data or parameters related to system or model attributes and/or external circumstances/environment and generates output indicating the behavior of the system or process under those parameters. In other words, the model or models may be used to predict behavior or trends based upon previously acquired data. There are many types of predictive models, including linear, non-linear, analytic, and empirical (e.g., statistical) models, among others, several types of which are described in more detail below.
Optimization generally refers to a process whereby past (or synthesized) data related to a system or process are analyzed or used to select or determine optimal parameter sets for operation of the system or process. For example, the predictive models mentioned above may be used in an optimization process to test or characterize the behavior of the system or process under a wide variety of parameter values. The results of each test may be compared, and the parameter set or sets corresponding to the most beneficial outcomes or results may be selected for implementation in the actual system or process.
FIG. 1 illustrates a general optimization process as applied to an industrial system or process 104, such as a manufacturing plant, according to the prior art. It may be noted that the optimization techniques described with respect to the manufacturing plant are generally applicable to all manner of systems and processes. More specifically, FIG. 1 illustrates an optimization system where a computer based optimization system 102 operates in conjunction with a process (or system) 104 to optimize the process, according to the prior art. In other words, the computer system 102 executes software programs (including computer based predictive models) that receive process data 106 from the process 104 and generate optimized decisions and/or actions 108, which may then be applied to the process 104 to improve operations based on specified goals and objectives.
Thus, many predictive systems may be characterized by the use of an internal model (e.g., a mathematical model) that represents a process or system 104 for which predictions are made. As mentioned above, predictive model types may be linear, non-linear, stochastic, or analytical, among others.
Generally, mathematical models are developed using one of two approaches (or a combination of both). One approach is to conceptually partition the system into subsystems whose properties are well understood, e.g., from previous experience or use. Each subsystem is then modeled using physical or natural laws and other well-established relationships that have their roots in earlier empirical work. These subsystems are then joined mathematically and a model of the whole system is obtained. The other approach to developing mathematical models is directly based on experimentation. For example, input and output signals from the system being modeled are recorded and subjected to data analysis in order to infer a model. Note that as used herein, static nonlinearity in the input/output mapping of a system is viewed as a special case of the general nonlinear dynamic input/output mapping, and hence the techniques described are also applicable when only a static input/output mapping is to be modeled.
The first approach is generally referred to as first-principles (FP) modeling, while the second approach is commonly referred to as empirical modeling (although it should be noted that empirical data are often used in building FP models). Each of these two approaches has substantial strengths and weaknesses when applied to real-world complex systems.
For Example, Regarding First-principles Models:
1. FP models are built based on the science underlying the process being modeled, and hence are better suited for representing the general process behavior over the entire operational regime of the process.
However:
2. First-principles information is often incomplete and/or inaccurate, and so the model and thus its outputs may lack the accuracy required.
3. Tuning of the parameters in the model is needed before the model could be used for optimization and control.
4. FP models may be computationally expensive and hence useful for real-time optimization and control only in slower processes. This is particularly apparent when the outputs in FP models are not explicit. For example, consider a model of the form G(yk, uk, xk)=0, where the output vector yk is an implicit function of input vector uk, state vector xk. In this case, an internal solver is needed to solve for yk at each interval.
5. When the process changes, modification of the first principles model is generally expensive. For example, designed experiments may be necessary to obtain or generate the data needed to update the model.
Regarding Empirical Models:
1. Since data capture the non-idealities of the actual process, where data are available, an empirical model can often be more accurate than a first-principles model.
However:
2. The available data are often highly correlated and process data alone is not sufficient to unambiguously break the correlation. This is particularly apparent when process operation is recipe-dominated. For example, in a linear system with 2 inputs and 1 output, a recipe may require two inputs to move simultaneously, one to increase by one unit and the other to decrease by one unit. If the output increases by one unit, the sign and value of the gain from the two inputs to the output cannot be uniquely determined based on these data alone.
3. Additional designed experiments are often needed in order to produce the necessary data for system identification; however, designed experiments disrupt the normal operation of the plant and hence are thus highly undesirable.
4. Certain regions or regimes of operation are typically avoided during plant operation, and hence the representative data for that region may not be available.
The complementary strengths and weaknesses of these two modeling routes are widely recognized, and efforts that combine the two are reported in the literature, some examples of which are described below.
One approach for using both FP information/models and empirical data is to develop combined models. For example, in “Modeling Chemical Processes Using Prior Knowledge and Neural Networks,” AlChE Journal, vol. 40, p. 1328, 1994, by M. Thompson and M. Kramer, (Thompson (1994)), a proposal is made to combine first-principles models with empirical nonparametric models, such as neural network models, in a hybrid architecture to model complex chemical processes, illustrated in FIG. 2. As FIG. 2 shows, inputs 201 are provided to a default parametric model 202 and a non-parametric model 204 (e.g., a neural network), whose combined (and optionally processed) outputs Z 205 are provided as input to a static nonlinear model 404, which then generates outputs 207. In Thompson's proposed hybrid architecture the neural network (nonparametric model) 204 is responsible for learning the difference between the default FP model 202 and the target data. Although the neural network is a nonparametric estimator capable of approximating this difference, it is also required to provide a negligible contribution to the model output for inputs far from the training data. In other words, the nonparametric model is required to contribute substantially in the operational range of the system, but not outside of this range. The training of the neural network in Thompson is therefore formulated as a semi-infinite programming (SIP) problem (reducible to a constrained nonlinear programming (NLP) problem if all inequalities are finite or infinite inequalities can be transformed into finite constraints) for which SIP solvers (constrained NLP algorithms in the case of NLP problem) may be used for training.
Another example of a combined model is described in “Identification and Optimizing Control of a Rougher Flotation Circuit using an Adaptable Hybrid Neural Model,” Minerals Eng., vol. 10, p. 707, 1997, by F. Cubillos and E. Lima (Cubillos (1997)), where a neural network model is used to model reaction rates for an ideal Continuous Stir Tank Reactor (CSTR) as a function of temperature and output concentration. In this example, the input and output data for the training of the neural network model is generated synthetically using the ideal CSTR model. Therefore, the neural network model is trained with explicit data for inputs/outputs of the neural network block in the combined model. In other words, the neural network block is detached from the combined model structure for training purposes, and is included in the combined model structure for optimization and control after training. Cubillos shows that the combined model has superior generalization capability compared to the neural network models alone, and that the modeling process was easier than synthesizing a FP model based on physical considerations.
In “Hybrid First-Principles/Neural Networks Model for Column Flotation,” AIChE Journal, vol. 45, p. 557, 1999, by S. Gupta, P. Liu, S. Svoronos, R. Sharma, N. Abdel-Khalek, Y. Cheng, and H. El-Shall (Gupta (1999)), yet another example of a combined model is presented, where the combined model is used for phosphate column flotation. In this approach, the FP model is obtained from material balances on both phosphate particles and gangue (undesired material containing mostly silica). Neural network models relate the attachment rate constants to the operating variables. A nonlinear optimizer in the form of a combination of simulated annealing and conjugate gradient algorithm is used for the training of the neural network models.
An alternative approach to combining FP knowledge and empirical modeling is to use FP information to impose constraints on the training of the empirical model. An example of this approach is reported in E. Hartman, “Training feedforward neural networks with gain constraints,” Neural Computation, vol. 12, pp. 811-829, April 2000 (Hartman (2000)), where gain information is used as constraints for the training of the neural network models. Hartman develops a method for training feedforward neural networks subject to inequality or equality-bound constraints on the gains (i.e., partial derivatives of outputs with respect to inputs) of the learned mapping. Hartman argues that since accurate gains are essential for the use of neural network models for optimization and control, it is only natural to train neural network models subject to gain constraints when they are known through additional means (such as, for example, bounds extracted from FP models or operator knowledge about the sign of a particular gain).
A further example of including first principles knowledge in the training of an empirical model is a bounded derivative network (BDN) (i.e., the analytical integral of a neural network) as described in “Introducing the state space bounded derivative network for commercial transition control,” IEEE American Control Conference, June 2003, by P. Turner, J. Guiver, and B. Lines of Aspen Technology, Inc. (Turner (2003)), and illustrated in FIG. 3. In this reference the BDN is proposed as a universal nonlinear approximator. As FIG. 3 shows, in this approach, a state space model 302 is coupled to the BDN 304, and inputs 301 are received by the state space model 302 and by the BDN 304. Based on the received input 301, the state space model then provides state information 303 to the BDN 304, as shown, and, based on the received inputs 301 and the received states 303, the BDN generates output predictions 307. As indicated by the name “bounded derivative network”, the parameters of the nonlinear approximator are trained through the application of a constrained NLP solver where one set of potential constraints is the bounds on input/output gains in the model.
Prior art approaches to using combined models (as described above) have used neural network models to represent the variation in a specific set of parameters in a FP model. The overall model is therefore the original FP model with some of its parameters varying depending on the input(s)/state(s) of the system. These prior art approaches are generally inadequate in the following situations:
1. When the FP model does not fully describe the process. For example, if FP information for only a part of the process is known, a combined model of the process that is appropriate for optimization and control cannot be built based on the prior art techniques (e.g., using the system of FIG. 2), even if representative measurements of all the relevant process variables are available.
2. When the FP model only implicitly describes the relationship between inputs/states/parameters/outputs. The prior art approaches do not address the issue of training a neural network that models the parameters of an implicit FP model.
3. When higher-order fidelity of the input/output mapping (such as first or second order derivatives of the outputs with respect to the inputs) is critical to the usability of the combined model for optimization and control. Prior art does not address the imposition of such constraints in the training of neural network models in the context of combined models as depicted in FIG. 2.
While the system described in Turner (2003) does address the issue of gain constraints in the proposed bounded-derivative-network (BDN), the training of the BDN is performed with explicit access to inputs and outputs of the trained model (similar to conventional training of a stand-alone neural network by a NLP solver), and the issue of bounded derivatives when a FP block appears in series with the output of the BDN is not addressed. More specifically, the bounded derivative network of Turner is used in a Wiener model architecture or structure (i.e. in a series connection with a linear state space model) to construct a nonlinear model for a physical process. The Weiner model architecture is illustrated in FIG. 4A, where a static nonlinear model follows a linear dynamic model 402. Thus, the BDN of FIG. 3 may be considered a special case of the Weiner model of FIG. 4A.
According to the Wiener model structure, the modification of the BDN will only affect the effective gain(s) between the inputs and outputs of the model. The identification of the dynamic behavior of the physical process occurs prior to the training of the BDN, and so changes in the state space model may require re-training of the BDN model. Indeed, the entire theory behind the training of the BDN in Turner (2003) is developed to ensure accurate representation of the process gains in the model. In an alternative but similar approach, FIG. 4B illustrates a Hammerstein model, where the nonlinear static model 404 precedes the linear dynamic model 402. Similar to the Weiner model structure, the nonlinear static model 404 and the linear dynamic model 402 are developed or trained in isolation of each other, and so modifications in the dynamic model 402 generally requires re-training of the nonlinear static model 404. Further information regarding Weiner and Hammerstein models may be found in Adaptive Control, 2nd Edition. 1994, by K. Astrom and B. Wittenmark.
Thus, improved systems and methods for combined models and their use are desired.