1. Field of the Invention
The present invention relates to predicting gas composition in a multistage separator, and particularly to the development of solutions to the regression problem of gas composition prediction using an ensemble of hybrid computational intelligence (CI) models.
2. Description of the Related Art
Non-hydrocarbon prediction in gas compositions is a challenging task due to the fact that the amounts of non-hydrocarbons are typically small and are treated as impurities in the gas. Further, the quantities vary by ranges as functions of temperature and pressure gradients. Further, there are no straightforward analytical solutions to predict their quantities. In recent years, computational intelligence techniques, such as artificial neural network (ANNs), have gained enormous popularity in predicting various petroleum reservoirs' properties, such as pressure-volume-temperature (PVT), porosity, permeability, viscosity and the like.
Although basic component prediction has been established, there is interest in the much more complex prediction of gas composition in multistage separators, particularly using computational intelligence techniques. Petroleum gas, or natural gas, is defined as a mixture of hydrocarbons and varying amounts of non-hydrocarbons that exist either in a gaseous phase or in solution with crude oil in underground reservoirs. Reservoirs are typically in the form of a sponge-like rock with interconnected open spaces between grains, typically found approximately a kilometer underground.
Capacity and efficiency of gas/liquid separation is of great concern in natural gas production. Oil resides in the reservoir at great temperatures and pressures, on the order of 5,000 psi and approximately 250° F. After the oil is extracted from the reservoir, it is collected in sequential multistage separator tanks at much lower temperatures and pressures, typically on the order of approximately 175 psi and 150° F. An exemplary multistage separator 100 is shown in FIG. 2. The reservoir oil initially resides within the reservoir R. In the first stage, the oil is extracted and held in the first-stage reactor, where gas is separated from the oil, and the extracted gas G1 is collected in a tank or the like. Moving through each stage, more gas is extracted from the oil as temperature and pressure are steadily decreased. In FIG. 2, once the gas G1 has been extracted, the oil is transferred to the second-stage reactor, where further separation is performed. Second-stage gas G2 is extracted at a pressure on the order of approximately 100 psi and a temperature of approximately 100° F. The oil is then passed to a third-stage reactor, where third-stage gas G3 is separated at a pressure on the order of approximately 14.7 psi and a temperature of approximately 60° F. Although a three-stage reactor is shown in FIG. 2, it should be understood that this is for exemplary purposes only, and that a multi-stage reactor may have many more intermediate stages.
A common complication that occurs in quantifying the behavior of such multiphase flows is that under high pressure, the properties of the mixture may differ considerably from those of the same mixture at atmospheric pressure, i.e., under pressure, the extracted gas may still contain liquid and solid constituents. The removal of these constituents forms the most important process step before delivery can take place. The liquids almost invariably consist of water and hydrocarbons that are gaseous under reservoir conditions, but which condense during production due to the decrease in gas pressure and temperature. Mixtures of non-hydrocarbons, such as N2, CO2 and H2S, are not desirable in the remaining stock tank oil, and removal of such non-hydrocarbons requires a great deal of additional energy and effort. Thus, prediction of the quantities of the non-hydrocarbons would greatly facilitate the multi-stage separator process.
In the industry, the equation of state (EOS) and empirical correlations (EC) are used to predict oil and gas properties, along with basic artificial intelligence (AI). For example, the Chevron Phase Calculation Program (CPCP) is a typical program that is based on EOS and EC. CPCP is a program designed to help the engineer to calculate the phase compositions, densities, viscosities, thermal properties, and the interfacial tensions between phases for liquids and vapors in equilibrium. The program takes reservoir gas compositions, C7+ molecular weight and density, and separator stage temperature and pressure as input, and then predicts gas compositions of that stage as output using EOS and EC.
EOS is useful for a description of fluid properties, such as PVT, but there is no single EOS that accurately estimates the properties of all substances under all conditions. The EOS has adjustment issues against the phase behavior data of reservoir fluid of known composition, while the EC has only limited accuracy. In recent years, computational intelligence (CI) techniques, such as ANN, have gained popularity in solving various petroleum related problems, such as PVT, porosity, permeability, and viscosity prediction.
In one such prior art technique, a multi-layer perceptron (MLP) with one hidden layer and a sigmoid activation function was used for the establishment of a model capable of learning the complex relationship between the input and the output parameters to predict gas composition. The ANN is a machine learning approach inspired by the way in which the human brain performs a particular learning task. ANN is composed of simple elements operating in parallel. These elements are inspired by biological nervous systems.
MLP (illustrated in FIG. 3) is one of the most popular ANN types, at present. MLP has one input layer, one output layer, and one or more hidden layers of processing units. MLP has no feedback connections. The hidden layers sit between the input and output layers, and are thus hidden from the outside world, as shown in FIG. 3. The MLP can be trained to perform a particular function by adjusting the values of the connections (weights) between elements. Typically, MLP is adjusted, or trained, so that a particular input leads to a specific target output. The weights are adjusted, based on a comparison of the output and the target, until the network output matches the target. Typically, many such input/target pairs are needed to train a network.
FIG. 4 illustrates a neuron with a sigmoidal activation function, where
      a    =                            ∑                      j            =            1                    m                ⁢                                            x              j                        ⁡                          (              n              )                                ⁢                                    w              j                        ⁡                          (              n              )                                ⁢                                          ⁢          and          ⁢                                          ⁢          y                    =                        σ          ⁡                      (            a            )                          =                  1                      (                          1              +                              ⅇ                                  -                  a                                                      )                                ,where xj represent the inputs, wj represent the weights for each of the n inputs, and y represents the output of the neuron. In the prior art technique for ANN component prediction noted above, each non-hydrocarbon is predicted separately. One hidden layer is used for each non-hydrocarbon. The configuration used for prediction of N2, CO2 and H2S is shown below in Table 1:
TABLE 1MLP Structure for each componentHidden LayerHidden LayerOuter LayerGasNodesActivation FunctionActivation FunctionN237logsigtansigCO237logsigtansigH2S80logsigtansigThe training algorithm “Levenberg-Marquardt” was used for predicting N2 and H2S, while “Resilient Back propagation” (Rprop) was used for predicting CO2. The other parameters that were used for MLP were Epochs, which was 300, a learning rate of 0.001 and a goal set to 0.00001. The MLP structure for predicting CO2 is shown in FIG. 5.
Petroleum deposits are naturally mixtures of organic compounds consisting mainly of non-hydrocarbons and hydrocarbons. The deposit that is found in the gaseous form is called “natural gas”, and that found in the liquid form is called “crude oil”. For the ANN prediction technique, the input parameter consists of a mole percent of non-hydrocarbons, such as N2, H2S and CO2, and hydrocarbons, such as methane (C1), ethane (C2), propane (C3), butane (C4 4), pentane (C5), hexane (C6), and heptanes and heavier hydrocarbons (C7+). The other input parameters are stock tank API, BPP, reservoir temperature, and separator pressure and temperature. In addition to the above, there are also isomers of C4 and C5. Above C7 components are considered as C7+. Molecular weight and density parameters of C7+ components are also given as input parameters. The non-hydrocarbons are of greater interest, as noted above. Thus, the output parameters consist of mole fractions of N2, CO2 and H2S. To increase the number of training samples, the Stage 1 and Stage 2 oil compositions were calculated from the available data using the material balance method. 70% of samples taken were randomly chosen for training, and the remaining 30% of samples taken were used for validation and testing.
For such ANN methods, common techniques for performance evaluation include the correlation coefficient (CC) and the root mean squared error (RMSE). The CC measures the statistical correlation between the predicted and the actual values. This method is unique, in that it does not change with a scale in values. The value “I” means perfect statistical correlation and a value of “0” means no correlation at all. A higher number represents better results. This performance measure is only used for numerical input and output. The CC is calculated using the formula
            ∑                        (                      x            -                          x              ′                                )                ⁢                  (                      y            -                          y              ′                                )                                    ∑                                            (                              x                -                                  x                  ′                                            )                        2                    ⁢                      ∑                                          (                                  y                  -                                      y                    ′                                                  )                            2                                            ,where x and y are the actual and the predicted values, and x′ and y′ are the mean of the actual and predicted values, respectively.
The RMSE is one of the most commonly used measures of success for numeric prediction. This value is computed by taking the average of the squared differences between each predicted value xn and its corresponding actual value yn. The RMSE is simply the square root of the mean squared error. The RMSE gives the error value with the same dimensionality as the actual and predicted values. It is calculated as
                                          (                                          x                1                            -                              y                1                                      )                    2                +                              (                                          x                2                            -                              y                2                                      )                    2                +        …        +                              (                                          x                n                            -                              y                n                                      )                    2                    n        ,where n is the size of the data.
The training and prediction time of the ANN prediction technique is simply (T2−T1), where T2 is the CPU time at the end of the prediction and T1 is the CPU time at the beginning of training. Training time is measured to observe how long the model requires for training, and the prediction time shows how fast the model can predict the test data. When compared against CPCP, the prior art MLP ANN method was found to achieve higher prediction accuracy with a lower RMSE and a higher CC value for N2 and H2S. CPCP was found to perform relatively well against the MLP ANN method for CO2. Thus, it would be desirable to be able to improve the results of the ANN technique, particularly in CO2 prediction. Further, the prior art MLP technique needs a very long time for training and takes a great deal of computational power and time. It would be desirable to be able to tune the MLP parameters, as well applying evolutionary techniques in order to better optimize parameters. Further, given the advantages of ensemble techniques with regard to the above, it would also be desirable to be able to adapt such an ANN technique for ensemble computing.
In statistics and machine learning, ensemble methods use multiple models to obtain better predictive performance than could be obtained from any of the constituent models. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble refers only to a concrete, finite set of alternative models. “Supervised learning algorithms” are commonly described as performing the task of searching through a hypothesis space to find a suitable hypothesis that will make good predictions with a particular problem. Even if the hypothesis space contains hypotheses that are very well-suited for a particular problem, it may be very difficult to find a good one. Ensembles combine multiple hypotheses to form a (hopefully) better hypothesis. In other words, an ensemble is a technique for combining many weak learners in an attempt to produce a strong learner. The term ensemble is usually reserved for methods that generate multiple hypotheses using the same base learner. The broader term of “multiple classifier systems” also covers hybridization of hypotheses that are not induced by the same base learner.
Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model, so ensembles may be thought of as a way to compensate for poor learning algorithms by performing a lot of extra computation. Fast algorithms, such as decision trees, are commonly used with ensembles, although slower algorithms can benefit from ensemble techniques as well.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built. Thus, ensembles can be shown to have more flexibility in the functions they can represent. This flexibility can, in theory, enable them to over-fit the training data more than a single model would, but in practice, some ensemble techniques (particularly “bagging”) tend to reduce problems related to over-fitting of the training data.
Empirically, ensembles tend to yield better results when there is a significant diversity among the models. Many ensemble methods, therefore, seek to promote diversity among the models they combine. Although perhaps non-intuitive, more random algorithms (such as random decision trees) can be used to produce a stronger ensemble than very deliberate algorithms (such as entropy-reducing decision trees). Using a variety of strong learning algorithms, however, has been shown to be more effective than using techniques that attempt to dumb-down the models in order to promote diversity.
Bootstrap aggregating, often abbreviated as “bagging”, involves having each model in the ensemble vote with equal weight. In order to promote model variance, bagging trains each model in the ensemble using a randomly-drawn subset of the training set. As an example, the random forest algorithm combines random decision trees with bagging to achieve very high classification accuracy. Given a standard training set D of size n, bagging generates in new training sets Di, each of size n′>n, by sampling examples from D uniformly and with replacement. By sampling with replacement, it is likely that some examples will be repeated in each Di. If n′=n, then for large n, the set Di is expected to have 63.2% of the unique examples of D, the rest being duplicates. This kind of sample is known as a bootstrap sample. The in models are fitted using the above in bootstrap samples and combined by averaging the output (for regression) or voting (for classification). Since the method averages several predictors, it is not useful for improving linear models. Similarly, bagging does not improve very stable models, like k nearest neighbors.
“Boosting” involves incrementally building an ensemble by training each new model instance to emphasize the training instances that previous models misclassified. In some cases, boosting has been shown to yield better accuracy than bagging, but it also tends to be more likely to over-fit the training data. By far, the most common implementation of boosting is AdaBoost, although some newer algorithms are reported to achieve better results.
While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight, and examples that are classified correctly lose weight. Thus, future weak learners focus more on the examples that previous weak learners misclassified.
AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, which is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the overfitting problem than most learning algorithms. The classifiers it uses can be weak (i.e., display a substantial error rate), but as long as their performance is not random (resulting in an error rate of 0.5 for binary classification), they will improve the final model. Even classifiers with an error rate higher than would be expected from a random classifier will be useful, since they will have negative coefficients in the final linear combination of classifiers, and hence behave like their inverses.
AdaBoost generates and calls a new weak classifier in each of a series of rounds t=1, . . . , T. For each call, a distribution of weights Dt is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased, and the weights of each correctly classified example are decreased, so the new classifier focuses on the examples that have, so far, eluded correct classification.
Typically, an ensemble is constructed in two steps. First, a number of base learners are produced, which can be generated in a parallel style (Bagging) or in a sequential style (Boosting), where the generation of a base learner has influence on the generation of subsequent learners. Then, the base learners are combined for use in the application. The most popular combination schemes for classification and regression are majority voting and weighted averaging, respectively.
Thus, a method of predicting gas composition solving the aforementioned problems is desired.