Interpretation of measurements in the field of geosciences (geology, geophysics or reservoir engineering) requires processing a large amount of data, often characterized by many parameters. Multivariate statistical methods are then used to solve various measurement characterization and interpretation problems. In particular, a problem that is frequently encountered is the definition of a typology of geologic or geophysical objects from indirect measurements carried out thereon. Examples thereof are the characterization of the seismic facies on trace portions at the level of the reservoir from seismic attributes, or the definition of electrofacies from logs.
a) Seismic facies: The parameters or attributes that characterize them are taken, for example, from seismic records. The attribute vector thus defined allows the traces to be represented as points in the space of the various attributes generated thereby. In this space, the co-ordinates of each trace are given by the values taken by the attributes on this trace (see FIG. 1). In order to group the traces together according to their proximity in the space of the attributes, well-known statistical pattern recognition techniques are applied, as described, for example, by:
Dumay J., Fournier F., 1988, "Multivariate Statistical Analyses applied to Seismic Facies recognition", Geophysics, 53, No.9, pp. 1151-1159. PA1 Dequirez P. Y., et al, 1995, "Integrated Stratigraphic and Lithological Interpretation of the Esat-Senlac Heavy Oil Pool", 65th Ann. Intern. Mtg. Soc. Expl. Geophys., pp. 104-107. PA1 Milligan et al., 1985, "An Examination of Procedures for Determining the Number of Clusters in a Data Set", Psychometrika, 50, 159-179. PA1 a) the multivariate probability law followed by the attribute vector is estimated; PA1 b) various modes corresponding to high-density zones of the density of the function expressing said law are selected by analysis and interpretation, the number of these modes showing the number of potential classes in the population of events and, for each mode, a certain number of points representing said events, the most representative ones of each class, and PA1 c) a classifying function is determined from these points and this classifying function is used to determine the probability of the other points belonging to the various potential classes.
Each group represents a seismic facies, i.e. a set of traces that are similar as regards the seismic nature in the sense of the attributes considered. Interpretation of the seismic facies leads to characterization of the geologic variations of the reservoir between the wells or to identification of zones where the seismic data show acquisition, processing artefacts, etc. This interpretation is performed a posteriori by analyzing correspondence with the geologic data available (well, reservoir model, etc.) when the statistical techniques implemented are classification tools (non supervised pattern recognition). On the other hand, a geologic interpretation is included in the facies determination via a priori information, with discriminant analysis tools (supervised pattern recognition). The two approaches, supervised and non supervised, are quite complementary, as shown, for example, by:
b) Electrofacies : They correspond to a set of points (depth points, taken from various wells, on a selected studying interval) exhibiting homogeneous characteristics for a selection of logs. Electrofacies most often have a geologic meaning (particular lithology, petrophysical properties, fluid content, etc.). As for the seismic facies analysis, statistical (supervised and non supervised) pattern recognition methods are applied to define automatically, for each well of a field, an electrofacies column.
Whatever the field of application, using statistical pattern recognition techniques allows, in addition to the classification of an object in a family, evaluation of the right classification probability. In fact, it is a probability vector that is estimated, which gives the probability of allocation in the various classes, the object being then allocated to the maximum probability class. This characterization of the uncertainty associated with the prediction of the class of belonging is particularly interesting for risk quantification.
A major difficulty with statistical pattern recognition is the determination of the correct number of "natural" classes underlying the population. This difficulty is particularly problematic in geosciences, where geologic interpretation in the broad sense of the term consists in comparing a certain model of the subsoil, which can be obtained a priori, with the indirect measurements recorded at the level of the reservoir. It is therefore very important to be able to characterize the natural structure of the measurements available. The classification methods applied in the field of geosciences do not allow solution of this problem. With learning, the number of classes is imposed by the a priori information taken into account, which can therefore bias the results. Without learning, partitioning methods are most often used, the number of classes being set by the user. It is therefore very difficult to select this parameter in an optimum way, i.e. to adapt it to the structure of the population. Conventionally, several tests are carried out with different numbers of classes, and class separability criteria are calculated in order to keep the best test, but the criteria are effective only subject to strong hypotheses on the data (according to which the classes are Gaussian for example, etc.). The technique mentioned here is described, for example, in