US 2010/0036529 pertains to a technique for detecting abnormal events in a process using multivariate statistical methods.
Early detection and diagnosis of an occurrence of an abnormal event in an operating plant is very important for ensuring plant safety and for maintaining product quality. Advancements in the area of advanced instrumentation have made it possible to measure hundreds of variables related to a process every few seconds. These measurements bring in useful signatures about the status of the plant operation.
A wide variety of techniques, for detecting faults, have been proposed in the literature. These techniques can be broadly classified into model-based methods and historical data based statistical methods. While model based methods can be used to detect and isolate signals indicating abnormal operation, for large scale complex chemical systems such quantitative or qualitative cause-effect models may be difficult to develop from the outset.
Artificial Neural Network (ANN)
Neural networks are computer algorithms inspired by the way information is processed in the nervous system.
Artificial Neural Networks (ANN) have emerged as a useful tool for non-linear modeling, especially in situations where developing phenomenological or conventional regression models becomes impractical or cumbersome. ANN is a computer modeling approach that learns from examples through iterations without requiring a prior knowledge of the relationships of process parameters. Consequently, ANN is capable of adapting to a changing environment. It is also capable of dealing with uncertainties, noisy data, and non-linear relationships.
ANN modeling methods have been known as ‘effortless computation’ and are readily used extensively due to their model-free approximation capabilities of complex decision-making processes.
The advantages of an ANN-based model are:
(i) it can be constructed solely from the historic process input-output data (example set),
(ii) detailed knowledge of the process phenomenology is unnecessary for the model development,
(iii) a properly trained model can be generalized easily due to its capability to accurately predict outputs for a new input data set, and
(iv) even multiple input-multiple output (MIMO) non-linear relationships can be approximated simultaneously and easily.
Owing to their attractive characteristics, ANNs have been widely used in chemical engineering applications such as steady state and dynamic process modeling, process identification, yield maximization, non-linear control, and fault detection and diagnosis, see Lahiri, S. K. and Ghanta K. C, 2008, Lahiri, S. K. and Khalfe N, 2010, Tambe et. al. 1996, Bulsari 1994, Huang 2003 and Stephanopoulos and Han 1996, for instance.
The most widely utilized ANN paradigm is the multi-layered perceptron (MLP) that approximates non-linear relationships between an input set of data (independent process variables) and a corresponding output data set (dependent variables). A three-layered MLP with a single intermediate (hidden) layer accommodating a sufficiently large number of nodes, also termed neurons or processing elements, can approximate or map any non-linear computable function with high accuracy. An approximation is obtained or “taught” through a numerical procedure called “network training” wherein network parameters or weights are adjusted iteratively such that the network, in response to the input patterns in an example set, accurately reproduces the corresponding outputs.
There exists a number of algorithms—each possessing certain advantageous characteristics—to train an MLP network, for example, the most popular error-back-propagation (EBP), Quick propagation and Resilient Back-propagation (RPROP) (Reidmiller, 1993).
Training of an ANN involves minimizing a non-linear error function (e.g., root-mean squared-error, RMSE) that may possess several local minima. Thus, it may become necessary to employ a heuristic procedure involving multiple training runs in order to obtain an optimal ANN model whose parameters or weights correspond to the global or the deepest local minimum of the error function.
Network Architecture
A MLP network used in model development is depicted in FIG. 1: Architecture of feed forward neural network. As shown, the network usually consists of three layers of nodes. The layers described as input layer, hidden layer and output layers, comprise R, S and K number of processing nodes, respectively. Each node in the input layer is linked to all nodes in the hidden layer and each node in the hidden layer is linked to all nodes in the output layer using weighted connections. In addition to the R and S number of input and hidden nodes, the MLP architecture also provides a bias node (with fixed output of R+1, S+1, respectively) in its input and hidden layers, not shown. The bias nodes are also connected to all the nodes in the subsequent layer and provide additional adjustable parameters or weights for model fitting. The number of nodes R in the MLP network's input layer is equal to the number of inputs in the process whereas the number of output nodes K equals the number of process outputs. However, the number of hidden nodes S is an adjustable parameter whose magnitude may be determined by various factors, such as the desired approximation and generalization capabilities of the network model.
Network Training
Training a network consists of an iterative process in which the network is given the desired inputs along with the corresponding outputs for those inputs. It then seeks to alter its weights to try and produce the correct output (within a reasonable error margin). If it succeeds, it has learned the training set and is ready to perform upon previously unseen data. If it does not succeed to produce the correct output it re-reads the input and again tries to produce the corresponding output. The weights are slightly adjusted during each iteration through the training set known as a training cycle, until the appropriate weights have been established. Depending upon the complexity of the task to be learned, many thousands of training cycles may be needed for the network to correctly identify the training set. Once the output is correct the weights can be used with the same network on unseen data to assess how well it performs.
Back Propagation Algorithm (BPA)
In the back propagation algorithm network weights are modified to minimize the mean squared error between the desired output and the actual output of the network. Back propagation uses supervised learning in which the network is trained using data for which input data as well as desired output data are known. Once trained, the network weights are maintained or frozen and can be used to compute output values for new input samples. A feed forward process involves presenting input data to input layer neurons that pass the input values onto the first hidden layer. Each of the hidden layer nodes compute a weighted sum of input passes the sum through its activation function and presents the result to the output layer. The goal is to find a set of weights that minimize the mean squared error. A typical back propagation algorithm can be given as follows:
The MLP network is a non-linear mapping device that determines a K dimensional non-linear function vector f, where f: X→Y. Wherein, X is a set of N-dimensional input vectors (X={xp}; p=1, 2, . . . , P and x=[x1, x2, . . . , xn, . . . , xN]T), and Y is the set of corresponding K-dimensional output vectors (Y={yp}; p=1, 2, . . . , P where y=[y1, y2, . . . , yk, . . . , yK]T). The mapping f is determined by:                (i) network topology,        (ii) choice of an activation function used for computing outputs of the hidden and output nodes, and        (iii) network weight matrices WH and WO referring to the weights between input nodes and hidden nodes, and hidden nodes and output nodes, respectively. Thus, the non-linear mapping f can be expressed as(iv) f:y=y(x;W)  (1)        (v) where, W={WH, WO}.        
This equation suggests that y is a function of x, which is parameterized by W. It is now possible to write the closed-form expression of the input-output relationship approximated by the three-layered MLP as:
                                                        y              k                        =                                                            f                  ~                                2                            ⁡                              [                                                      ∑                                          l                      =                      0                                        L                                    ⁢                                                                          ⁢                                                            w                      lk                      o                                        ⁢                                                                                            f                          ~                                                1                                            ⁡                                              [                                                                              ∑                                                          n                              =                              0                                                        N                                                    ⁢                                                                                                          ⁢                                                                                    w                              nl                              H                                                        ⁢                                                          x                              n                                                                                                      ]                                                                                            ]                                              ;                                          ⁢                      k            =            1                          ,        2        ,                  …          ⁢                                          ⁢          K                                    (        2        )            
Note that in equation 2, the bias node is indexed as the zeroth node in the respective layer.
In order for an MLP network to approximate the non-linear relationship existing between the process inputs and the outputs, it needs to be trained in a manner such that a pre specified error function is minimized. In essence, the MLP-training procedure aims at obtaining an optimal set W of the network weight matrices WH and WO, which minimize an error function. The commonly employed error function is the average absolute relative error (AARE) defined as:
                    AARE        =                              1            N                    ⁢                                    ∑              1              N                        ⁢                                                  ⁢                                                        (                                                                            y                      predicted                                        -                                          y                      experimental                                                                            y                    experimental                                                  )                                                                                      (        3        )            
The most widely used formalism for the AARE minimization is the error-back propagation (EBP) algorithm utilizing a gradient-descent technique known as the generalized delta rule (GDR). In the EBP methodology, the weight matrix set, W, is initially randomized. Thereafter, an input vector from the training set is applied to the network's input nodes and the outputs of the hidden nodes and output nodes are computed.
The outputs are computed as follows. First the weighted-sum of all the node-specific inputs is evaluated, which is then transformed using a non-linear activation function, such as the logistic sigmoid function. The outputs from the output nodes are then compared with their target values and the difference is used to compute the AARE defined in equation. 3. Once the AARE is composed, the weight matrices WH and WO are updated using the GDR framework. Repeated with the remaining input patterns in the training set the procedure completes one network training iteration. For the AARE minimization, several training iterations may usually be necessary.
Generalizability
Neural learning is considered successful only if the system can perform well on test data on which the system has not been trained. This capability of a network is called generalizability. Given a large network, it is possible that repeated training iterations successively improve the performance of the network on training data e.g., by “memorizing” training samples, but the resulting network may perform poorly on test data i.e., unseen data. This phenomenon is called “overtraining”. A proposed solution is to constantly monitor the performance of the network on the test data.
Hecht-Nielsen (1990) proposes that the weight should be adjusted only on the basis of the training set, but the error should be monitored on the test set. Here we apply the same strategy: training continues as long as the error on the test set continues to decrease and is terminated if the error on the test set increases. Training may thus be halted even if the network performance on the training set continues to improve.
Principal Component Analysis (PCA)
Often it is time consuming to monitor a plant condition in modern complex process industries as there is abundance of instrumentation that measure thousands of process variables in every few seconds. This has caused a “data overload” and due to the lack of appropriate analyses very little is currently being done to utilize this wealth of information. Given the current process control computer systems (DCS, on-stream analyzers and automated quality control labs) in modern chemical plants, it is not uncommon to measure hundreds of process variables online every few seconds or minutes, and tens of product variables every few minutes or hours.
Although a large number of variables may be measured, they are almost never independent; rather, they are usually very highly correlated. The true dimension of the space in which the process moves is almost always much lower than the number of measurements. Fortunately in data sets with many variables, groups of variables often move together because more than one variable may be measuring the same driving principle governing the system behavior. In many petrochemical systems there are only a few such driving forces. But an abundance of instrumentation allows us to measure dozens of system variables.
When this happens, one can take advantage of this information redundancy. For example, one can simplify the problem by replacing a group of variables with a single new variable. PCA is a quantitatively rigorous method for achieving this simplification. Multivariate statistical methods such as PCA (Principal Component Analysis) are capable of compressing the information down into low dimensional spaces which retain most of the information. The method generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables. All the principal components are orthogonal to each other so there is little or no redundant information.
Principal component analysis comprises extracting a set of orthogonal, independent axes or principal components that are linear combinations of the variables of a data set, and which are extracted or calculated such that the maximum extent of variance within the data is encompassed by as few principal components as possible. The first principal component is calculated to account for the greatest variance in the data; the second principal component is then calculated to account for the greatest variance in the data orthogonal to the first principal component, the third to account for the greatest variance in the data orthogonal to the first two principal components, and so on. For each principal component extracted, less and less variance is accounted for. Eventually, the extraction of further principal components no longer accounts for significant additional variance within the data. By such means, a multi-dimensional or multi-variable data set can be reduced to fewer dimensions or principal components, while still retaining as much useful information within the resulting data as possible, which greatly simplifies analysis of the process data.
The position of a data point along a given principal component is referred to as its “score”. The weighting of a variable for a given principal component is referred to as its “loading”.