Production intelligence is an emerging new field. Production intelligence aims at solving problems including monitoring process operations and alerting operators to an event or alarm; identifying the potential causes of an event; and suggesting possible remedial actions to operators.
Production intelligence can be applied to any field that includes one or more auto-correlated or cross-correlated data. One such field is manufacturing. In manufacturing processes such as oil sand extraction, processing circuits, oil refinery, delay coking, energy and utility, mineral and metal production, power generation, and biofuel/food production, the processes are operated by domain experts (e.g. oil sand plant operators, chemists and geographers) equipped with knowledge in their fields. However, a manufacturing process can be highly complex. It consists of many subcomponents, each of which has multiple inputs, states and outputs. It is not uncommon for a process to have more than 500 sensors recording temperatures, flows, pressures, liquid levels, PH values, etc. in real times.
In the past ten years, advancement in sensor and wireless technology has enabled process engineers to acquire data that were unavailable previously. However, it is very challenging to turn this huge volume of data into valuable information. When process engineers or plant managers are challenged to find the root cause of a problem, or look for ways to improve the process, current systems yield very little information on their own. Reports show “what” has happened, but offer little insight into “why” that happened. The amount of data that is generated by sensors and the complex interactions that occur between process parameters is overwhelming. Hence, it is virtually impossible for a human, who by nature can only comprehend a problem in several dimensions, to effectively analyze such a complex multivariate problem.
The prior art in production intelligence includes neural networks, self-organizing maps, support vector machines, decision trees, pattern discovery, motif discovery, naïve Bayesian and Gaussian mixture model (GMM).
Neural network is a black-box approach where the operators cannot understand why and how it does the analysis and predictions. Hence, neural network cannot be used for decision supports. It has been used in automatic control with limited success. The major problem is that it is easy for a neural network to overfit the data (i.e. memorize the noises in the system) and yet the operators still do not know the problem. In practice, it is very difficult for a operator to determine whether the results generated from neural network can be trusted or not because it is a black-box and it is not trivial to determine whether it is over-fitting or not. Self-organizing map (SOM) is another type of neural network. It is a clustering technique that groups similar data points into clusters. However, the clusters generated are difficult to interpret. Also, SOM does not consider times in its model generation.
Support vector machine is a variant of neural network. It has been proven to be more accurate than neural network. However, the problem of black-box and over-fitting still exist. Decision tree is more transparent than neural network and support vector machine. However, it is easy for a decision tree to grow overwhelmingly large and hence its interpretation is still very challenging for an operator. Furthermore, decision tree only works in supervised learning environment (each data point must have a class label). However, many manufacturing processes are unsupervised by nature.
Pattern discovery has been successfully used in oil sand processes for over a decade. Nevertheless, it has two major problems. First, it does not take into account any time information in the data. Second, the number of patterns it generates is huge.
Motif discovery was originally developed in bioinformatics for gene and protein sequences. Later, it has been used for time series data. Motif discovery has mainly applied to well-structured data such as DNA and protein sequences. When applied to real world sensors data, it does not handle noise effectively. In addition, the rich prior knowledge in DNA and proteins is usually not available in complex manufacturing domains. This limits the use of motif discovery in production intelligence.
Naïve Bayesian utilizes a simple assumption that all sensors are conditionally independent given the target sensors. Thus, Naïve Bayesian is very fast and memory efficient. However, it is not very accurate depending on how seriously this assumption is violated. Naïve Bayesian suffers from the same problem of decision tree in that it only works in a supervised learning environment.
Gaussian mixture model (GMM) uses multiple Gaussian kernels to fit the data. It is also a black-box approach where results are difficult to comprehend. In addition, it is easy to overfit a data set with GMM. The main advantage of GMM is its computation efficiency.