In the oilfield industry, performing pressure testing in a borehole leads to a characterization of the formation in terms of the fluids present. Conceptually, pressure exhibits a linear dependency with respect to the depth of the formation, and the linear slope (gradient) of the pressure is indicative of the fluid type (e.g., oil, water, or gas). Therefore, discrete sampling of formation pressures at different formation depths can indicate where and what types of fluids are present in the formation.
Traditionally, human analysts interpret pressure gradients based on visual inspection of the sampled data. However, noise, undersampling, complexity, and other problems with the sampled data may render the manual interpretation difficult or ambiguous for the human analyst. Moreover, manual analysis of the data can be cumbersome, labor intensive, and/or prone to analyst bias.
Data of measurable physical properties can be analyzed in a number of ways. In particular, exploratory statistical methods, such as data clustering (grouping), can suggest patterns in the data that are otherwise unpredictable by an analyst. By classifying collected data into clusters, the cluster analysis can help analysts interpret the data, optimize a process (e.g., control an operation), and/or infer properties of interest.
Common forms of cluster analysis use the popular c-means clustering models. The c-means models cluster a batch of data points into c partitions (groups) and employ an iterative optimization (or alternating optimization) principle to minimize a clustering objective function, which incorporates a presumed clustering similarity measure. These clustering models output a set of points representative of their associated clusters (typically cluster centers) and a matrix that indicates the probability that a given point belongs to a given cluster.
The three general clustering algorithms for the c-means clustering models include hard c-means (also known as k-means), fuzzy c-means, and possibilistic c-means. In the hard c-means clustering algorithm, cluster partitions are crisp so that every point has a single certain cluster membership. In the fuzzy or possibilistic clustering algorithms, each point may have varying degrees of likelihood for belonging to each possible cluster.
For the purposes of background information, the following references discuss clustering algorithms, which may be referenced herein:    a. [Bezdek et al. 1978]: J. C. Bezdek and J. D. Harris, “Fuzzy Relations and Partitions: An Axiomatic Basis for Clustering,” Fuzzy Sets and Systems 1, 112-127 (1978).    b. [Bezdek et al. 1981b]: J. C. Bezdek, C. Coray, R. Gunderson, and J. Watson, “Detection and Characterization of Cluster Substructure: I. Linear Structure: Fuzzy c-Lines,” SIAM J. Appl. Math., Vol. 40, 339-357 (1981).    c. [Bezdek et al. 1981b]: J. C. Bezdek, C. Coray, R. Gunderson, and J. Watson, “Detection and Characterization of Cluster Substructure: II. Fuzzy c-Varieties and Convex Combinations thereof,” SIAM J. Appl. Math., Vol. 40, 358-372 (1981).    d. [Bezdek et al. 1995]: J. C. Bezdek, R. J. Hathaway, N. R. Pal, “Norm-Induced Shell Prototype (NISP) Clustering,” Neural, Parallel and Scientific Computation, Vol. 3, 431-450 (1995).    e. [Bezdek et al. 1999]: J. C. Bezdek, J. M Keller, R. Krishnapuram, N. R. Pal, “Fuzzy Models and Algorithms for Pattern Recognition and Image Processing,” Kluwer, Dordrecht, in Press (1999).    f. [Botton et al. 1995]: L. Botton and Y. Bengio, “Convergence Properties of the K-means Algorithms, In G. Tesauro and D. Touretzky (Eds.) Advances in Neural Information Processing Systems 7,” Cambridge, Mass., The MIT Press, 585-592 (1995).    g. [Hathaway et al. 1993] R. J. Hathaway and J. C. Bezdek, “Switching Regression Models and Fuzzy Clustering,” IEEE Transactions on Fuzzy Systems, Vol. 1, 195-204 (1993).    h. [MacQueen 1967]: J. B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability,” Berkeley, University of California Press, 1:281-297 (1967).
The c-means clustering models assume point prototypes and the computed clusters under such models typically have a hyperellipsoidal or cloud-like structure that is implicitly defined. One clustering algorithm known in the art based on the hard c-means model is the k-means clustering algorithm mentioned previously. The k-means algorithm classifies or clusters multi-attribute objects (i.e., points) into a number (k) of groups based on a similarity measure or distance function between any two points. To do the grouping, the algorithm starts with a predefined number (k) of clusters randomly initialized and then follows an iterative local optimization scheme to minimize the sum of squared distances between each data point and its corresponding cluster centroid the cluster's data mean point). See [MacQueen 1967].
Although such traditional clustering assumes point prototypes, shape-driven clustering algorithms are also known that use other mathematical constructs, such as mathematical models or surfaces for cluster prototypes. In general, the shape-driven clustering algorithms can be divided into two categories: (1) algorithms that match the norm used in the distance or similarity function to the geometry of the individual clusters, and (2) algorithms that redefine the cluster prototype to assimilate the cluster shape information. Much of the optimization principles applied by the algorithms are based on the c-means clustering models. Any specialized treatment for each algorithm lies mainly in the proper choice of the prototype definition, the appropriate corresponding distance function, and possibly the objective function. Complexity of the iterative optimization steps depends on these choices. See [Bezdek et al. 1999].
As one example, the Gustafson-Kessel (GK) model is a fuzzy clustering algorithm that matches data to desired or expected cluster shapes. It performs shape matching using an adaptive distance norm that defines the similarity function while keeping the cluster prototypes as regular points. Hence, optimization is done with respect to an additional variable matrix used to adapt the distance norm. The shapes of the computed clusters are implicitly defined by the Eigen properties of the adaptive matrix used in the optimization. In particular, the GK model obtains hyperellipsoidal cluster shapes, which can also approximate lines and planes as these may be viewed as special limit cases of ellipsoids. See [Bezdek et al. 1999].
Another algorithm uses a fuzzy paradigm for clustering multidimensional data assuming r-dimensional flat surface prototypes, which are more formally known as linear manifolds or hyperplanes. Under this approach, the prototype optimization is done with respect to the independent vectors defining the directions of the hyperplane and a point belonging to the hyperplane. This optimization is done in addition to the fuzzy membership matrix included as part of the optimization problem, which is similar to point-prototype clustering described previously. A perpendicular offset (distance) is used as the similarity function. Variants of this approach allow prototypes to be convex combinations of hyperplanes. See [Bezdek et al. 1978]; [Bezdek et al. 1981a]; [Bezdek et al. 1981b]; and [Bezdek et al. 1999].
Surface (“shell”) prototypes were devised for boundary detection applications, and several algorithms that implement such prototypes recognize spherical and elliptical cluster prototypes. Various distance functions may be defined and may yield a tradeoff between optimization complexity and solution accuracy. Other methods target quadric prototypes, which can be viewed as a generalization of shell clustering that includes forms of quadric surfaces. Similar to “shell” prototype clustering, the choice of the distance function may be critical to the complexity of the optimization procedure. See [Bezdek et al. 1999].
Another clustering algorithm uses prototypes that are shells of shapes defined by norm functions, hence norm-induced shell prototypes. The shells are formally represented by multidimensional closed/open balls of a given radius. The norm-dependent point-to-shell shortest distance is used along with a c-means-type optimization algorithm. Among the shell shapes implied by this norm-induced model are hyperspherical, hyperelliptical, squares, diamonds, etc. See [Bezdek et al. 1995].
Finally, a fuzzy c-regression clustering model assumes that a number of functional relationships exist among the dependent and independent variables and that clustering should seek to partition the data under the assumption that cluster prototypes conform to these presumed functional relationships or regression models. The distance function is tied to the measure of the model error; however, the latter is restricted to special class of models that satisfy a special property to assure global optimization when fitting a prototype through a cluster of points. The algorithm assumes the data exist in a pre-collected batch to be clustered into a fixed number of clusters prototyped by any of a fixed number of switching regression models. The algorithm employs the iterative optimization principle of the fuzzy c-means clustering model to compute the fuzzy partitions. See [Hathaway et al. 1993].
The subject matter of the present disclosure is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.