The analysis of earth science data often involves the simultaneous interpretation of data and its many derived attributes. An attribute of the data is a broadly defined term meaning any quantity computed or otherwise derived from the data, including the data themselves. The use of different data sources or types and of their derived attributes helps geophysicists to have a better understanding of the subsurface by providing alternative perspectives. The main drawback of this approach has been the increasing number of data elements (i.e., data sources or data sets, data types, or data attributes) because of the increasing number of alternative and complex scenarios that must be considered for analysis, which tends to overload geophysicists when they try to manually combine the different data elements into their interpretation.
Consider the following example. Suppose that an interpreter has a set of data elements that can help him/her locate or interpret certain geologic features, such as a channel. To locate the feature, however, the interpreter needs to look for the occurrence of a specific pattern, or patterns, manifested simultaneously across several of the data elements. In doing this manually, not only is it easy to overlook the occurrence of the feature, but it is hard to mentally keep track of what is happening in each data element simultaneously for several data elements, especially as the number of data elements increases. To make matters worse, if the computation of an attribute depends on a parameter, as is often the case, the interpreter has to either mentally manage this additional degree of complexity or, more commonly, fix the parameter for the attribute beforehand. In doing so, however, the parameter is chosen independently of the other attributes thus neglecting the potential relationship between the attribute and other data elements, which may call for the use of a different parameter value.
The example highlights a number of problems, such as the fact that geophysicists do not know beforehand whether a data element has the information they need, or if it is redundant because of other data elements already being considered, or if a given relationship between data elements exists and, if it does, where in the data, or which parameter value might be better to highlight a feature in a given set of data elements. For each of these problems, one can ask a specific question for which one can formulate, implement, and apply a specific measure or method to answer the question. Indeed, for specific questions and in very limited settings, a number of methods have been described in the literature. However, this approach is very cumbersome in a general paradigm because it is often impractical to exhaustively define in advance all measures needed to answer all the potential questions, or cope with an increasing number of data elements or attributes.
What is needed then is a general statistical analysis framework for dealing with the above-described technical problem. A number of methods have been reported in the published literature that address specific questions or perform an analysis in specific settings. The known methods employ a pre-defined statistical measure (even if multiple alternative measures are sometimes stated) to quantify the similarity between data elements. The pre-defined statistical measure of similarity is then used for a variety of analyses. Some examples include the following.
Attribute Selection
US Patent Application Publication No. 2011/0119040, “Attribute importance measure for parametric multivariate modeling,” by J. A. McLennan, discloses a method to measure the importance and select the relevant attributes describing a subsurface formation. To measure the importance of the attributes, the author provides an attribute importance measure built from the matrix of correlation coefficients.
U.S. Pat. No. 7,502,691, “Method and computer program product for determining a degree of similarity between well log data,” by P. A. Romero, discloses a method to determine the similarity between nuclear magnetic resonance (NMR) well log data and other well log recordings.
“Information entropy Monte Carlo simulation,” by A. Kato (see online presentation at: http://www.rpl.uh.edu/pdf/Chapter3—2_AYATO.pdf), presents results on the use of information theoretic measures to assess the information conveyed about rock lithofacies by other attributes.
Evaluation of the Data Quality
US Patent Application Publication No. 2010/0312477, “Automated log quality monitoring systems and methods,” by W. C. Sanstrom and R. E. Chemali, discloses a method to analyze the data quality of well log recordings involving the application of a comparison function to determine a log quality indicator.
Data Fusion
“Sensor/data fusion based on value of information,” by S. Kadambe and C. Daniell, in Proc. of the 6th Intl. Conf. on Information Fusion, 25-32 (2003), also cited as paper “DOI: 10.1109/ICIF.2003.177422,” describes a number of measures to assess the value of information from different data sources. That result is then used in deciding whether to combine the data source with other data sources.
Deriving a Model that Captures or Enhances Some Desired Characteristic of the Data
US Patent Application Publication No. 2010/0161235, “Imaging of multishot seismic data,” by L. T. Ikelle, discloses a method for imaging of the subsurface using multishot data without decoding, wherein the mutual information statistical measure is used to derive a model that separates different components of that data.
“How reliable is statistical wavelet estimation?,” by J. A. Edgar and M. van der Baan, in Geophysics 76(4), pp. V59-V68 (2011), compares different statistical measures for estimation of the seismic wavelet model from data.
“Electromagnetic/seismic joint inversion in multilayered media,” by Q. H. Liu et al. (see online presentation at:
http://people.ee.duke.edu/˜qhliu/Presentations/Liu_MURI_Review_Feb2004.pdf),
presents a method wherein the measure is used to align the different data types while performing joint inversion.
The first three types of analyses are the most relevant for the present invention, although none of them teach a general statistical analysis framework for dealing with their technical problem.