Seismic imaging is the process of converting acoustic measurements of the Earth into images of the Earth's interior, much like ultrasound for medical imaging. It is widely used in oil and gas exploration and production to identify regions that are likely to contain hydrocarbon reservoirs and to help characterize known reservoirs to maximize production. These methods have become critical to the energy industry as known reserves are used up and new reserves become increasingly difficult (and expensive) to find and are increasingly in technically challenging areas, like the deep sea. For the past several decades, the energy industry has tried to balance the need to image quickly and the need to image accurately. The need for accuracy is driven by the high cost of drilling a “dry” well due to poor imaging (a deep sea well can cost over $100 million) and the need for quick imaging is driven by the cost of not finding new reserves (i.e., bankruptcy). To minimize these costs, the industry relies on supercomputing clusters and regularly increases compute power, enabling both faster imaging on existing algorithms and the practical implementation of more accurate imaging. Thus, the development of fast, efficient methods for imaging is of high importance to the industry.
Seismic imaging data varies widely depending on how and where the data is collected (e.g., on land, at sea, at the ocean surface, at the ocean floor, below ground, electromagnetically, etc). One data collection method in particular implements a towed hydrophone receiver arrays for ocean seismic data collection. The basic idea is shown in FIG. 1 depicting a ship 200 shown towing a 2D array 212 of hydrophones spaced about every 25 m on 1 to 16 trailed streamers. Every 15 or so seconds, an air cannon is fired into the water creating an acoustic wave, e.g., a shot 211) that propagates through the water and into the Earth. Reflections from various surface and subsurface boundaries cause echoes that reflect back and are recorded by each hydrophone in the array 212. The recording of a single hydrophone in time as a trace and the collection of traces for a single firing of the air cannon is called a common shot gather, or shot. As the ship 200 moves, a large set of spatially overlapping shots is recorded. Depending on the size of the survey region to be imaged, this data collection can take a month or more and is designed to get the maximal coverage of the area to be imaged. The receiver data R(x, y, z, t) collected for potentially hundreds of thousands of shots is the result of some source data S(x, y, z, t) at a particular location.
A sample of artificial shot data is shown in FIG. 2 showing a sample shot data 215 for a 1D array of hydrophones showing time on the Y-axis and spatial offset on the X-axis. The direct source signal propagates out linearly in time (from the center of the array) and appears as straight lines 216. The recorded reflections appear as curved lines 218.
Two critical requirements drive production seismic imaging: The need for improved imaging to accurately locate and characterize elusive oil and gas reservoirs; and the need for timely subsurface imaging. Drilling too soon risks expensive dry wells while drilling too late risks delaying the time to oil. To minimize these risks, the industry regularly increases the power of its supercomputing clusters, enabling both faster imaging on existing algorithms and the practical implementation of more accurate imaging. However, raw supercomputing power is not enough. It is equally important—if not more so—to implement algorithms efficiently based on a detailed knowledge of the hardware.
The Reverse Time Migration (RTM) algorithm (see, e.g., Zhou, H., Fossum, G., Todd, R. and Perrone, M. 2010 Practical VTI RTM in Proceedings of 72nd EAGE Conference) is widely used in the industry because of its superior imaging accuracy for difficult subsurface structures like salt domes which are poorly imaged by other algorithms but which are very effective at trapping oil and gas. Several variants of RTM exist with differing degrees of approximation to reality, all of which use single-precision arithmetic.
Historically seismic imaging has processed shot gathers individually on a single compute node so that they could be processing in parallel. This approach has many benefits; but for algorithms like RTM, as computational power increases, data I/O becomes the single largest performance bottleneck, particularly when the model is large.
The RTM algorithm arises from the observation that pressure waves should be correlated at reflection boundaries; so RTM proceeds by correlating two pressure waves (called the forward and backward waves) to find those boundaries. To generate the waves for correlation, RTM simulates wave propagation using the wave equation below for a wave U(x,y,z,t) with a source term S(x,y,z,t):
                                          1                          c              2                                ⁢                                                    ∂                2                            ⁢              U                                      ∂                              t                2                                                    =                                                            ∂                2                            ⁢              U                                      ∂                              x                2                                              +                                                    ∂                2                            ⁢              U                                      ∂                              y                2                                              +                                                    ∂                2                            ⁢              U                                      ∂                              z                2                                              +          S                                    1        )            
The forward wave is the wave generated from the air cannon firing and propagating forward in time using a “velocity model” represented by C(x,y,z), which specifies the wave velocity at each point in space and represents the various material properties and boundaries of the volume being imaged. The air cannon firing is treated as a wavelet impulse localized in time and space. The backward wave is generated by using the shot data recorded by the hydrophone array as the source term for the wave equation and propagating that backward in time. These two waves are then multiplied point-wise at each time step to generate an image, using the following “imaging condition”:I(x,y,z)=ΣtUForward(x,y,z,t)UBackward(x,y,z,t)  2)
This process is repeated for all shots in the seismic survey and the images generated are summed to create a final image of the reflecting boundaries, which represent the subsurface structure. It is important to note that the time summation in the imaging condition implies that the first time step of the forward wave needs to be correlated with the last time step of the backward wave. This constraint is typically handled in one of two ways: either the forward wave is saved to disk (called a “snapshot”) every time step or every several time steps and read in for imaging when the backward wave is computed, or the forward propagation is run twice—once forward in time and once in reverse time using boundary data saved from the forward pass to recreate the forward pass in reverse—and then imaging proceeds with the backward wave and the reverse forward wave.
The first method requires significant disk storage and can be bottlenecked on disk I/O, while the second requires 50% more computation and additional memory space to save the boundary data. Following standard practice in the industry as described in Zhou, et al. 2010, the wave propagation of Equation (1) is simulated using the finite difference approximation in formula (3) where there is selected the coefficients to implement 2nd order accuracy in time and 8th order accuracy in space. These coefficients are scaled to satisfy the CFL condition (Courant Friedrichs Lewy http://en.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E2%80%93Lewy_condition. This approach gives rise to a stencil shown in FIG. 3 which depicts a 25-Point spatial stencil 220 with 8th order accuracy shown in isolation on the left and the stencil 222 as it moves along the stride-1 dimension of the model.
                              U                      i            ,            j            ,            k            ,                          t              +              1                                      =                              2            ⁢                                                  ⁢                          U                              i                ,                j                ,                k                ,                t                                              -                      U                          i              ,              j              ,              k              ,                              t                -                1                                              +                                    c                              i                ,                j                ,                k                            2                        ⁢                                          ∑                                  n                  =                                      -                    4                                                                    n                  =                  4                                            ⁢                                                          ⁢                              (                                                                            A                      xn                                        ⁢                                          U                                                                        i                          +                          n                                                ,                        j                        ,                        k                        ,                        t                                                                              +                                                            A                      yn                                        ⁢                                          U                                              i                        ,                                                  j                          +                          n                                                ,                        k                        ,                        t                                                                              +                                                            A                      zn                                        ⁢                                          U                                              i                        ,                        j                        ,                                                  k                          +                          n                                                ,                        t                                                                                            )                                                                        3        )            
Where the A's are the stencil coefficients and can be determined using known methods and subscripts i, j, k are the indices of a pointing a 3D velocity model In practice, the size of production RTM models varies widely, but the universal desire is to grow models larger to get more resolution and to run the models longer to enable deeper imaging since echoes take longer to reflect from deeper structures.
Typically, velocity models for individual shots are 5123 to 10243 elements or larger and the number of time steps can be 10,000 or more in both the forward and backward propagation phases. Seismic imaging is typically computed using single precision arithmetic.
Industrial implementations of RTM are embarrassingly parallel. They typically run individual shots on one to two nodes of a compute cluster and run many shots in parallel, e.g., one shot per slave node, as shown in FIG. 4. These clusters have minimal network connectivity because it is not needed: the individual shots run independently and asynchronously. A simple work queue is used to manage runs and if a run for a shot fails, it is simply re-run, as it doesn't impact any of the other runs. A master process running at node 230 manages the work queue and to merge the partial images that are generated from each shot. Additionally, other image processing (not shown) might be included in this process.
As shown in FIG. 4, a velocity model V( ) 225 is conceptually depicted with each sub-block 28 representing a sub-set of the model data corresponding to a single shot. For calculating an image, the entire model is not needed, e.g., a sub-set of the model is used for a selected shot. A master processing node 30 in communication with a memory storage device, e.g., database 235, is programmed to allocate a RTM compute cluster 240 including a plurality of slave nodes 245, each node 245 having a corresponding memory storage device 250. Each slave node 245 is in communication with the master node 230 and is configured to obtain and process the single shot data 228a. When a (forward or reverse) pressure wave and resulting image is calculated at a respective node, it is stored to the main memory storage disk 250. Memory storage devices 250 that store intermediate results data in calculating the forward or reverse pressure wave which values are used to calculate resulting image are referred to as “Scratch” disks. RTM compute clusters 40 have significant per-node scratch disk requirements for saving snapshot data, which for a 10243 model and 10,000 time steps would require 40 TB of snapshot storage—per shot. In practice, snapshot sub-sampling is used to reduce both disk requirements and disk I/O bottlenecks; however sub-sampling results in image degradation and must be balanced with performance. Compression can be used to trade computation for disk I/O, but if lossy, compression can also degrade image quality.
The velocity model 225 is the representation of the velocity of the wave traveling at an x, y, z coordinate in the sub-surface of the earth, and, for purposes of description, is referred to as V2. As velocity model 225 is not known in the RTM Process, an imaging condition enables the snapshot of the wave at a particular point in time after calculating forward wave and backward waves. For every value of t there is produced an image from an average of many shots. When averaging many shots, the image at a coordinate at a time t, i.e., I(x, y, z) is obtained via the imaging condition according to:I(x,y,z)=ΣtPS(x,y,z,t)PR(x,y,z,t)  4)where PS(x, y, z, t) is the reflected forward power pressure wave at a coordinate and PR(x, y, z, t) is the reverse power pressure wave at an x, y, z coordinate at a time t. In a high-level aspect the forward motion calculation, the velocity model v(x,y,z) is loaded, the pressure wave is loaded at time t and the previous pressure wave at time t−1, i.e., P(x,y,z,t−1), and the next pressure wave P(x,y,z,t+1) is computed (bootstrap) and is stored. This process is performed at each iteration (time step t). The wave field is a 3-D object and is very large (e.g., on the order of 10243 elements). Thus, for example, at four bytes of data for each calculation, this may amount to 4 GBytes of data for 1 time step of the algorithm for a single shot) and there may be thousands of time steps amounting to over 24 TB of data for a single shot which, when sent out to disk, is time consuming. Thus in practice, in an effort to reduce the data, the data is not stored or calculated at every time step, i.e., the calculated PS(x,y,z,t) is either stored or loaded to disk every Nth iteration.
Standard finite difference RTM is implemented as follows: (1) Compute the (forward) wave equation with shot source terms, saving “snapshots” of the entire wave field to disk every Nth time step; and (2) Compute the (reverse) wave equation with receiver source terms, reading in corresponding forward wave field snapshots and computing an imaging condition to incrementally generate an image. This process is repeated for each shot and the resulting images are merged to generate a final image. RTM has several variants depending on the complexity of the wave equation used (e.g., isotropic, VTI (Vertical Transverse Isotropy), TTI, (Tilted Transverse Isotropy). Analysis of RTM codes suggest that typically computation is not the performance bottleneck for these algorithms but rather the movement of data, particularly on and off disk, is the primary performance bottleneck. So it is essential that close attention be paid to the movement of data in order to develop fast RTM applications.
As FLOPS per processor increase, the prior art parallel implementations become disk I/O bound. It is possible to improve performance by partitioning single shot gathers over multiple nodes; however, such implementations typically use only a handful of nodes. Furthermore, in the prior art implementations, the inter-processor or node communications are severely limited by their bandwidth and thus results in large latency.
Industrial seismic imaging typically processes multi-petabyte shot data sets (collected by generating “shots” and then recording the resulting echos using an array of sensors, typically hydrophones or geophones to generate an image of the subsurface). The echoes are recorded for a pre-specified duration of time. These recordings are paired with the source noise from the air cannon to form a “shot”. For each shot, the locations of the air cannon and each of the sensors in the sensor array are recorded and stored with the recorded echo data and any related information about the source data. For a typical survey, over 100 thousand individual shots, or more, are commonly recorded. Shots are processed independently by the forward and backward steps of the RTM algorithm and the partial image generated for each shot is combined with all of the other partial images to form a single final image.
This approach requires the generation of large quantities of “scratch” data—i.e., data that is relevant only to the calculation of a single shot's partial image. For a 1000×1000×1000 element velocity model, the scratch data can be well over 10 terabytes of data. Multiply this number by the number of shots in a seismic survey and there is an enormous amount of data to be stored and read during the processing of the RTM algorithms for such a survey. It is this data movement that makes disk I/O the primary performance bottleneck of the RTM algorithm.