This invention relates generally to the field of statistical prediction models used in geophysical prospecting. In particular, the invention relates to a method for assessing reliability of predicted values for geophysical or geological parameters from statistical predictive models used in geophysical prospecting.
In the ongoing search for subsurface hydrocarbons, geophysicists have developed methods for evaluating and interpreting the structure and characteristics of the subsurface formations of the earth. One such method is the analysis of seismic data traces. Of particular importance to geophysicists is the ascertainment of formation structure and characteristics consistent with the presence of hydrocarbon deposits.
The seismic data traces that are analyzed using the method of the present invention are obtained by any conventional means. One of ordinary skill in the art would understand that seismic data traces are usually obtained by the reflection of acoustic waves from geologic layers of differing acoustic impedance. Acoustic impedance is dictated by the physical properties of the material through which the acoustic wave travels. Such properties include lithology, porosity, and fluid content.
Seismic data are generally obtained by imparting seismic energy into the earth, then receiving and recording the energy reflected by the subsurface formations. This received energy is then processed to produce seismic signals or traces, which depict reflection information as a function of the time lapse from signal generation to reception and the embedded seismic pulse. As is known to those of ordinary skill in the art, processing of seismic data may vary, but typically includes stacking, migration, and deconvolution.
Originally, seismic data traces were used simply for ascertaining formation structure. However, exploration geophysicists have developed a number of methods to obtain a variety of characteristics that describe the seismic data traces. Such characteristics are termed attributes. These attributes provide quantitative measures of the shape of the seismic data traces. Attributes are said to be instantaneous when values for the attributes are obtained for each data point (i.e., each time sample) or within a small time window of data points. Examples include amplitude, phase, frequency, dip, and power. Attributes are said to be interval, when values for the attributes are taken from a specified seismic interval within a seismic trace. Examples include averages, maximums, and cycle widths of measured characteristics of the seismic signal over the seismic interval.
The objectives of seismic attribute analysis include identifying the boundaries of geologic intervals of differing acoustic impedance and assigning characteristic values, which may exhibit lateral variations, for the physical rock properties within each of these geologic intervals. There are two key steps used in the art to assign these characteristic values for the physical rock properties. The first step is a seismic-to-synthetic well tie, which compares the seismic signal shapes and attributes identified on at least one seismic data trace at or near a well location with those identified in a synthetic seismogram which is developed for that well. The well being used is termed a calibration well. This synthetic seismogram is generated using well data such as well log data or core data, coupled with standard techniques familiar to those skilled in the art. The second step is termed seismic attribute calibration, which involves statistically relating the attributes obtained from seismic data traces that are presumed to represent the formation properties at a well for any seismic interval, with the measured rock properties in that well over that same interval. Thus, a seismic-to-synthetic well tie relates a real seismic data trace to a synthetic seismogram, while calibration relates a real seismic data trace to actual rock properties as determined by the well data, such as well log or core data.
In seismic attribute calibration, a statistical prediction model is used to investigate a relationship between observable seismic attribute variables and a variable representing a geological or geophysical rock property. This involves establishing a predictive relationship between the independent variables observed at densely distributed seismic locations and the dependent variable of interest observed at typically more sparsely distributed well locations. The statistical prediction model may be linear regression or any other statistical technique. This relationship may then be used to predict the variable of interest at locations away from the wells by using the observed seismic attributes.
Suppose the relationship between the rock property variable and attribute variables can be described by a statistical prediction model. In general, let z be the dependent variable of interest and let x represent the independent variables as an attribute vector. It is assumed that z and x are related by a functional relationship, z=f(x). A statistical prediction model represents the relationship as {circumflex over (z)}={circumflex over (f)}(x), where {circumflex over (f)} is an estimating function for the prediction model and {circumflex over (z)} is the predicted value for a given attribute vector x. This relationship will be illustrated using a multiple linear regression model, although the present invention applies to any statistical prediction technique. In a multiple linear regression model, the dependent variable z is represented as a linear combination of the independent attribute vector x:
z=aTx+b+e.xe2x80x83xe2x80x83(1)
Here a is a vector of linear coefficients, b is an intercept, and e is a model or observation error. The superscript T represents the transpose of a vector.
For a set of N training data points, (xi, zi), i=1, . . . , N, of independent attribute vectors xi and dependent variable zi, the classical regression approach uses a least squares criteria to estimate the linear coefficients a and b. This yields a prediction model:
{circumflex over (z)}=xc3xa2Tx+{circumflex over (b)},xe2x80x83xe2x80x83(2)
where {circumflex over (z)} is the predicted value for attribute x, and xc3xa2 and {circumflex over (b)} are prediction coefficients.
Reliability measures of statistical prediction models such as confidence intervals can be calculated only when certain statistical assumptions are imposed on the training data, which are rarely met in real situations. If the expected value of the error term e of equation (1) is zero, then the prediction coefficients a and b of prediction equation (2) are unbiased estimates of the linear coefficients a and b of equation (1). This assumption also implies that the underlying relationship between the variables x and z in equation (1) is linear. In this case, the predicted value {circumflex over (z)} is an unbiased estimate of z. However, if this linear assumption does not hold, then the predicted value {circumflex over (z)} becomes a biased estimate of z. There are cases where the physical relationship between the dependent variable and the independent variables supports a linear regression model. In these cases, it would be useful to have statistical confidence intervals for the predicted value {circumflex over (z)} in prediction equation (2). A confidence interval gives upper and lower bounds between which there is a given probability, say 95%, of finding the variable. However, obtaining these confidence intervals requires that the errors that populate the error term e in equation (1) be normally distributed, have constant variance, and be statistically independent of one another. These conditions are rarely met in practical data analysis situations, and the confidence intervals computed from a multiple linear regression model will usually give false information. The confidence intervals associated with multiple linear regression analysis are valid measures for assessing the reliability of predicted values only when linearity and the above statistical assumptions are met.
Similarly, for a predictive neural network, such as a back-propagation neural network, no standard method for determining reliability exists. Thus, there exists a general need for quantifying the reliability of predicted values from statistical prediction models without the strict assumptions needed in classical confidence interval calculations.
The present invention is a method for assessing the reliability associated with a statistical prediction of a specified geological or geophysical parameter at a designated location. The statistical prediction is obtained from a prediction model constructed from a set of N training data points. Each training data point comprises a training data attribute vector and an associated observed value of the specified geological or geophysical parameter. Further, each training data attribute vector includes one or more seismic attributes obtained from seismic data traces located at or near a well and each associated observed value of the specified parameter is obtained from well log or core data from that well. First, a residual is determined for each of the training data points. The residual is the difference between the associated observed value of the specified parameter for the training data point and a predicted value of the specified parameter for the training data point obtained from the prediction model. Next, an attribute vector is determined for the designated location. Next, a predicted value of the specified parameter at the designated location is obtained using the attribute vector for the designated location and the prediction model. Next, N basic probability distributions are determined from the N training data attribute vectors, the N associated observed values, the N residuals, and the predicted value of the specified parameter at the designated location. Next, N basic probability assignments are determined for each of three hypotheses that the predicted value of the specified parameter at the designated location is reliable, unreliable, and unpredictable, respectably, from the N basic probability distributions. Finally, reliability, unreliability, and unpredictability values for the predicted value of the specified parameter at the designated location are determined as combinations of the N basic probability assignments for each of the hypotheses that the predicted value of the specified parameter at the designated location is reliable, unreliable, and unpredictable, respectively.