Various on-site experiments are performed in the course of locating and recovering subterranean hydrocarbon resources. Such experiments include, but are not limited to, seismic and acoustic surveys. Seismic surveys characterize a formation based on measurements of properties of waves propagated through the formation. Typically, elastic waves such as compressional, shear and Stoneley waves are established by mechanical disturbances. Measurements of properties such as velocity (or its reciprocal, slowness) in the formation and in a borehole can help in evaluation and production of hydrocarbon resources. The results of such experiments are dependent upon various factors, such as positioning of wave source and receivers, and wave characteristics. It is generally desirable to have a well designed experiment in order to efficiently obtain desired data without excessive experiment modification.
The field of optimal experimental designs (OED) can help to design experiments for effective data collection when, for instance, acquisition time, acquisition costs and high data quality are of concern. Although it originated as a discipline of statistics, OED now has applications, in physics, biology, geophysics, sociology and pharmaceutics. In the geophysical sciences, OED has been considered in contexts including oceanographic acoustic tomography, electromagnetic soundings, seismic tomography, seismic amplitude versus offset, microseismic monitoring and resistivity measurements. Consider the possibly nonlinear forward problem,d=g(m;p),  (1)where d is an N-dimensional vector of data predicted from a model given by the M-dimensional model vector m, and P-dimensional vector p represents the control parameters. The Bayesian inverse problem comprises finding the optimal estimate of m, along with its uncertainty, that is consistent with noisy measurements. In contrast to the inverse problem, in which parameters are estimated from measurements, OED seeks to predict which experimental design, p, will yield the best estimate of model parameters from measurements to be collected in the future. In a Bayesian inverse problem, the solution is expressed as a probability density function of m conditional on the measurements and the control parameters. It is expressed using Bayes' rule in the following form:π(m|d,p)∝π(d|m,p)π(m),  (2)where π(m|d,p) is the posterior probability density, π(d|m,p) is the likelihood function, and π(m) is the prior probability density. The likelihood function measures the fitness of the predicted data relative to the noisy measurements. The prior probability density describes information available about the model before collecting the measurements. If the model can be linearized around a reference model m0 and both the observation noise and the prior can be assumed to be multinormal, the posterior is also multinormal. If the prior and/or observation noise are not multinormal, a transformation can be applied to make them multinormal (S. Houlding, “Practical Geostatistics: Modeling and Spatial Analysis,” Springer, 2000). Linearizing Eq. 1 about m0 yields:d=Gm,  (3)where G is the N×M sensitivity matrix defined by Gij=[∂gi/∂mj]m=m0. The likelihood function then has the form
                                          π            ⁡                          (                                                d                  ❘                  m                                ,                p                            )                                ∝                      exp            [                                          -                                  1                  2                                            ⁢                                                (                                                            d                      obs                                        -                    Gm                                    )                                T                            ⁢                                                C                  D                                      -                    1                                                  ⁡                                  (                                                            d                      obs                                        -                    Gm                                    )                                                      ]                          ,                            (        4        )            where dobs=d+ε is a vector of noisy measurements and ε represents the noise associated with the observations. This noise is assumed multinormal with zero mean and covariance CD. The prior model probability density is defined by
                                          π            ⁡                          (              m              )                                ∝                      exp            [                                          -                                  1                  2                                            ⁢                                                (                                      m                    -                                          m                      prior                                                        )                                T                            ⁢                                                C                  M                                      -                    1                                                  ⁡                                  (                                      m                    -                                          m                      prior                                                        )                                                      ]                          ,                            (        5        )            where mprior is the available mean a priori model and CM is the covariance describing the uncertainty around mprior. The multinormal posterior probability density, from Eq. 2, is given by
                                          π            ⁡                          (                                                m                  ❘                  d                                ,                p                            )                                ∝                      exp            [                                          -                                  1                  2                                            ⁢                                                (                                      m                    -                                          m                      ~                                                        )                                T                            ⁢                                                                    C                    ~                                    M                                      -                    1                                                  ⁡                                  (                                      m                    -                                          m                      ~                                                        )                                                      ]                          ,                            (        6        )            with mean and posterior covariance{tilde over (m)}=mprior+CMGT(GCMGT+CD)−1(dobs−Gmprior)  (7)and{tilde over (C)}M=(GTCD−1G+CM−1)−1.  (8)
K. Smith, “On the standard deviations of adjusted and interpolated values of an observed polynomial functions and its constants and the guidance they give towards a proper choice of the distribution of observations,” Biometrika, 12:1-85, 1918, may be the earliest work on optimal design. Smith introduced an optimality criterion which effectively seeks to minimize the maximum variance in the predicted values by minimizing the determinant of the matrix GT(GT G)−1G. This corresponds to minimizing the determinant of the posterior data covariance matrix {tilde over (C)}D=GT{tilde over (C)}MG when assuming that there is no prior information (CM−1=0) and CD∝I (data uncertainties are assumed not correlated among samples and statistically invariant). A. Wald, “On the efficient design of statistical investigations,” Ann. Math. Stat., 14:134-140, 1943, suggests another optimality criterion that effectively seeks to minimize the volume of the confidence ellipsoid of the model parameters by minimizing the determinant of the matrix GTG. This corresponds to minimizing the determinant of the posterior model covariance matrix CM when assuming, again, that CM−1=0 and CD∝I. J. Kiefer and J. Wolfowitz, “Optimum design in regression problems,” Ann. Math. Stat., 30:271-294, 1959, introduced the concept of an approximate design and the alphabetic classification of optimality criteria. The minimization of GTG is called the D-optimality criterion in this classification. Since those early works, several authors have proposed other optimality criteria. However, most of the OED algorithms proposed in these works are computationally expensive. They employ essentially global optimization algorithms (simulated annealing, genetic algorithms, etc.) when considering nonlinear systems and whole matrix inversion when considering linear systems.
A more computationally efficient method suggested by O. Dykstra, “The augmentation of experimental data to maximize —X′X—,” Technometrics, 13(3):682-688, 1971, and more recently by Darrel Coles, “Optimal Experimental Design Applied to DC Resistivity Problem,” PhD thesis, Massachusetts Institute of Technology, 2008, is based on a greedy algorithm. Let G=[g1T g2T . . . gXT]T be the sensitivity matrix whose rows, giT, are the sensitivity kernels of the candidate observations. Define  as the index set of all of the candidate observations to be considered in the design problem. Out of this set, which could possibly be very large, select a subset of observations that constitutes an optimal design. This is a multivariate global optimization problem that is expensive to compute. In Coles' greedy approach, this is reduced to a univariate optimization problem in which a single best observation is sought that is optimal when added to an already selected base experiment. Iterating this approach, this sequence of locally optimal solutions yields a final solution that, while not guaranteed to be globally optimal, is typically of high quality and is much faster to compute than the global optimization problem. Let Gn be the n×M sensitivity matrix of the base experiment, whose rows correspond to the observations chosen so far from the rows of G. Let gn+1T be a row of G corresponding to a candidate observation. Thus, at step n+1, the sensitivity matrix of the experiment under consideration is given by the block matrix
                              G                      n            +            1                          =                              [                                                                                G                    n                                                                                                                    g                                          n                      +                      1                                        T                                                                        ]                    .                                    (        9        )            Using a D-optimality criterion, gn+1 is chosen to maximize
                                                                                                                          G                                          n                      +                      1                                        T                                    (                                      C                    D                                          -                      1                                                        )                                                  n                  +                  1                                            ⁢                              G                                  n                  +                  1                                                                                                                                                                      G                    n                    T                                    (                                      C                    D                                          -                      1                                                        )                                n                            ⁢                              G                n                                                                .                            (        10        )            A ratio of determinants is used here in order to emphasize that the greedy optimization step at n+1 is with respect to the fixed base experiment at step n. Note that the D-optimality criterion has be augmented to have the form GTCD−1G to allow it to account for anticipated measurement uncertainty in keeping with the likelihood term in Eq. 8. Using rank-one update formulas and the assumption that CD is a diagonal matrix whose n-th diagonal entry is σn2, this ratio simplifies to
                                                                                                                          G                                          n                      +                      1                                        T                                    (                                      C                    D                                          -                      1                                                        )                                                  n                  +                  1                                            ⁢                              G                                  n                  +                  1                                                                                                                                                                      G                    n                    T                                    (                                      C                    D                                          -                      1                                                        )                                n                            ⁢                              G                n                                                                =                  1          +                                    σ                              n                +                1                                            -                2                                      ⁢                                                            g                                      n                    +                    1                                    T                                ⁡                                  (                                                            G                      n                      T                                        ⁢                                          G                      n                                                        )                                                            -                1                                      ⁢                                          g                                  n                  +                  1                                            .                                                          (        11        )            Coles showed that the gn+1 that satisfies this maximization problem is maximally orthogonal to the rows of Gn, and takes advantage of this property to propose an efficient algorithm built on Gram-Schmidt orthogonalization (G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins, Baltimore, Md., USA, 1996). This use of orthogonality is a refinement on the greedy D-optimality approach originally suggested by P. A. K. Covey-Crump and S. D. Silvey, “Optimal regression designs with previous observations,” Biometrika, 57(3):551-566, 1970 and O. Dykstra, “The augmentation of experimental data to maximize —X′X—,” Technometrics, 13(3):682-688, 1971. By using the D-optimality criterion, these approaches assume there is no prior information (CM−1=0) and that observation noise is uncorrelated among samples.