Artificial neural networks (ANNs) are suitable for modeling complex multiple input-multiple output nonlinear processes owing to their ability of approximating nonlinear relationships to an arbitrary degree of accuracy (Poggio, T. and Girosi, F. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247, 978, 1990). As a result, ANNs have been extensively used in industries for making on-line and off-line predictions of process variables. The industrial applications of ANNs include process identification, steady-state and dynamic process modeling, fault detection and diagnosis, soft-sensor development, and nonlinear process control and monitoring. These ANN applications have been comprehensively reviewed by Tambe and co-authors (Tambe, S. S., Kulkarni, B. D., Deshpande, P. B. Elements of Artificial Neural Networks with Selected Applications in Chemical Engineering, and Chemical & Biological Sciences, Simulation & Advanced Controls Inc.: Louisville, USA, 1996). During any process operation, huge amounts of process input-output data are generated, which can be used to develop ANN models that can predict in advance the values of process output variables. The desirable characteristics of an ANN model are: (i) it should accurately predict the outputs contained in the input-output example dataset used for its construction, and (ii) it possesses good generalization capability. Conventionally, ANN models are trained using a suitable weight-adjustment algorithm that minimizes a pre-specified cost (error) function. It may be noted that the form of the cost function completely determines the stochastic properties (noise sensitivity) of the resulting ANN model. For instance, most widely used error-back-propagation (EBP) (Rumelhart, D., Hinton, G., Williams, R., Learning representations by backpropagating errors. Nature, 323,533,1986) algorithm performs minimization of the root-mean-squared-error (RMSE) function. In any large set of process data, presence of instrumental noise and/or measurement errors is imminent. The presence of noise and/or errors in the input-output data used for network training creates a threshold limit for the accuracy of model predictions and the generalization performance exhibited by the model. This happens mainly because the network tries to approximate (learn) the average relationship existing between the input and the output data containing noise and/or errors. Since the network ignores the noise and errors in the data, the average relationship captured by it is fraught with inaccuracies. The inaccuracies in the prediction, if they are significant, cannot be tolerated since a significant number of control and policy decisions regarding the process operation are based on the predictions made by the model. For example, in polymerisation reactors, prediction of quality variables, such as melt flow index (MFI), stress exponent (Sex), etc., are important in deciding the grade of the polymer produced. An ANN model capable of generalization not only accurately predicts the outputs in the data (example set) used for its development, but also those corresponding to a new or novel input data. Thus, it is critically important that an ANN model possesses not only excellent prediction accuracy, but also a good generalization property.
It has been observed by Gorp and coworkers (Gorp, J. V., Schoukens, J., Pintelon, R., Learning neural networks with noisy inputs using the errors-in-variables approach, Transactions on Neural Networks A. 180, 1–14, 1999) that in commercial software, most of the ANN models are trained using a simple output error (OE) cost function and this can lead to severe biased errors in the network predicted output when input data are noisy. The authors show that presence of noise actually suppresses the higher order derivatives of ANN model's transfer function and a bias is introduced if the conventional least-squares cost functions are employed. Accordingly, a method for improving ANN's generalization performance recommends replacement of the RMSE cost function with a novel cost function, for instance, the Errors-In-Variables (EIV) cost function (Gorp, J. V., Schoukens, J., Pintelon, R., Learning neural networks with noisy inputs using the errors-in-variables approach, Transactions on Neural Networks A. 180, 1–14, 1999). The drawback of the EIV method is that its implementation requires the knowledge of variances pertaining to the inputs and outputs. In many practical settings, this information is not available, thus severely limiting the utility of the EIV method. The methodology, though works better for the noisy measurements, it also requires large memory and can be caught into a local minimum. Alternative methodologies, such as: (i) using EIV method as a post-processing tool after application the OE method, (ii) usage of the measured input and output values instead of the estimated values, and (iii) modified learning and optimization schemes, are variedly proposed and illustrated (Gorp, J. V., Schoukens, J., Pintelon, R., The errors in variables cost function for learning neural networks with noisy inputs, Intelligent Engineering Systems Through Artificial Neural Networks, 8, 141–146, 1998).
Literature reporting the effects of addition of noise on the performance of an ANN model is relatively scarce and only a few systematic studies have been conducted so far. It is generally known that addition of noise to the training data helps in obtaining a model possessing better generalization performance. Sietsma and Dow reported (Sietsma, J., Dow, R., J., Creating artificial neural networks that generalize, Neural Networks 4, 67–79, 1991) the beneficial effects of noise and added pseudo-Gaussian-distributed noise to each element of the training pattern (vector). They showed that training with noise-added data improves the classification ability of the multilayer perceptron (MLP) networks. The study also revealed that higher number of network nodes are now required and each node contributes independently to the solution; it is also possible that a few units, which do not contribute significantly to the network output, can be removed via a suitable network pruning technique. This viewpoint is also shared by Minai and Williams (Minai, A. A., Williams, R. D., Perturbation response in feedforward networks, Neural Networks, 7(5), 783–796, 1994) who proposed to generate larger networks where each node contributes to a smaller extent towards the global computation. In another exhaustive study, An studied (An, G., The effects of adding noise during backpropagation training on a generalization performance. Neural Comput., 8, 643–674, 1996) the effects of noise addition on the generalization performance of an EBP-based network training. Thus, An's study separately analyzed the effects of noise in the inputs, weights, and the outputs, on network's prediction performance. The study revealed that noise in the outputs does not improve generalization, whereas noise in the inputs and weights is helpful. It was also observed that training of network using Langevin noise leads to the global minimization similar to that obtained using the simulated annealing approach. In a theoretical study, Bishop (Bishop, C. M., Training with noise is equivalent to Tikhonov regularization, Neural Comput., 7, 108–116, 1995) claimed that the error term induced by the noise corresponds to a class of generalized regulariser. The regularisation (Poggio, T., Girosi, F. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247, 978, 1990) modifies the error function via the addition of a penalty term and controls the variance produced by the network. In essence, addition of noise in the training data provides a form of smoothing and the method works because the functions to be learned by the ANN are generally smooth, or at least piecewise continuous in finite number of regions. The statement embodies the underling assumption that for a well-posed problem, a unique solution exists and the small perturbations in the data should produce only small variations in the solution. In other words, for two similar inputs, two similar outputs are expected. Thus, for a given example data set, additional network training patterns can be generated by superimposing small amount of noise. The noise magnitude must be small since a large amount of noise will clearly distort the intrinsic relationship between the inputs and the outputs, while too small noise amount will lead to insignificant changes of no consequence. It immediately follows that it is necessary to exactly quantify the ‘small’ amount of noise to be superimposed on the input-output example data. It may be noted that in nonlinear systems, which exist abundantly in manufacturing and processing industries, the sensitivity with which changes in an input variable affect the output variable, may differ significantly. Consequently, it becomes necessary to add varying extents of noise to each input and output variable. Determining the exact amount of noise to be added to each input-output variable is a tricky issue and the present invention provides a genetic algorithm based effective solution to address this problem.
Genetic algorithms (Goldberg, D. E., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley: New York, 1989, Holland, J., Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich., USA) are members of a class of function minimization/maximization formalisms, known as ‘stochastic optimization algorithms’. They are based on the mechanisms of natural selection and genetics, which play a dominant role in the Darwinian evolution of biological organisms. The GAs are known to be efficient in searching noisy, discontinuous, multi-modal, and non-convex solution spaces and their characteristic features are: (i) they are ‘zero’th order search techniques implying that GAs need only the scalar values and not the derivatives of the objective function to be optimized, (ii) GAs perform a global search and, hence, they mostly converge to the global optimum on the objective function surface, (iii) the search procedure used by the GAs is stochastic and, hence, they can be utilized without invoking ad-hoc assumptions, such as, smoothness, differentiability, and continuity, pertaining to the form of the objective function (owing to this feature, GAs can be used to solve optimization problems that cannot be solved using the classical gradient-based algorithms, which require the objective function to simultaneously satisfy the above-stated criteria), and (iv) the GA procedure can be effectively parallelized, which helps in efficiently and speedily searching a large multi-dimensional solution space. The present invention discloses a genetic algorithm based method for arriving at the optimal level of noise to be added to each input/output variable of the example set thereby creating an enlarged noise-superimposed sample dataset to be used in the ANN training such that the trained network possesses improved prediction accuracy and generalization performance.
In the GA procedure, the search for an optimal solution vector (also termed decision vector) representing the tolerance values of noise to be super-imposed on the input/output variables in the example set, begins from a randomly initialized population of probable (candidate) solutions. The solutions, usually coded in the form of binary strings (chromosomes), are then tested to measure their fitness in fulfilling the optimization objective i.e., function minimization or maximization. Subsequently, the candidate solutions are ranked in the decreasing order of their fitness scores and a main loop of GA operations comprising selection, crossover, and mutation, is performed on the ranked population. Implementation of the loop generates a new population of candidate solutions, which as compared to the current population usually fares better at fulfilling the optimization objective. The best string that evolves after repeating the above-described loop several times, forms the solution to the optimization problem. While evaluating the fitness of a solution vector, the input/output variable specific noise tolerance values contained therein are used to generate a large number of noise-superimposed sample input-output patterns, corresponding to each pattern in the example set; the resulting enlarged data set is then used for training the neural network with a view to minimize a least-squares cost function such as the RMSE. The training of ANN is performed using a gradient-based or other suitable weight-updation formalism. The RMSE magnitude obtained thereby is used to compute the fitness value of the candidate vector solution comprising noise tolerances. The network trained on the data generated using the GA-optimized noise tolerance values approximates better the true input-output relationship in presence of instrumental noise and/or measurement errors and, therefore, possesses good prediction accuracy and generalization performance.
The present invention is based on considering two examples viz. (i) ANN-based modeling of an industrial polymerisation reactor, and (ii) ANN-based modeling of continuous stirred tank reactor (CSTR) wherein an exothermic consecutive A→B→C reaction occurs. The prediction accuracies obtained using the invented method are compared with those obtained using a commonly used network training procedure.