Seismic imaging is the process of converting acoustic measurements of the Earth into images of the Earth's interior, much like ultrasound for medical imaging. It is widely used in oil and gas exploration and production to identify regions that are likely to contain hydrocarbon reservoirs and to help characterize known reservoirs to maximize production. These methods have become critical to the energy industry as known reserves are used up and new reserves become increasingly difficult (and expensive) to find and are increasingly in technically challenging areas, like the deep sea. For the past several decades, the energy industry has tried to balance the need to image quickly and the need to image accurately. The need for accuracy is driven by the high cost of drilling a “dry” well due to poor imaging (a deep sea well can cost over $100 million) and the need for quick imaging is driven by the cost of not finding new reserves (i.e., bankruptcy). To minimize these costs, the industry relies on supercomputing clusters and regularly increases compute power, enabling both faster imaging on existing algorithms and the practical implementation of more accurate imaging. Thus, the development of fast, efficient methods for imaging is of high importance to the industry.
Seismic imaging data varies widely depending on how and where the data is collected (e.g., on land, at sea, at the ocean surface, at the ocean floor, below ground, electromagnetically, etc). One data collection method in particular implements a towed hydrophone receiver arrays for ocean seismic data collection. The basic idea is shown in FIGS. 1 and 2 depicting a ship 200 shown towing a 2D array 212 of hydrophones spaced about every 25 m on 1 to 16 trailed streamers. Every 15 or so seconds, an air cannon is fired into the water creating an acoustic wave, e.g., a shot 209 that propagates through the water and into the Earth. As shown in FIG. 2, reflections 202 in a forward direction from various surface and subsurface boundaries cause echoes that reflect back and are recorded by each hydrophone in the array 212. The recording of a single hydrophone in time as a trace and the collection of traces for a single firing of the air cannon is called a common shot gather, or shot 211. As the ship 200 moves, a large set of spatially overlapping shots is recorded. Depending on the size of the survey region to be imaged, this data collection can take a month or more and is designed to get the maximal coverage of the area to be imaged. The receiver data R(x, y, z, t) collected for potentially hundreds of thousands of shots is the result of some source data S(x, y, z, t) from shots 209 at particular locations.
Two critical requirements drive production seismic imaging: The need for improved imaging to accurately locate and characterize elusive oil and gas reservoirs; and the need for timely subsurface imaging. Drilling too soon risks expensive dry wells while drilling too late risks delaying the time to oil. To minimize these risks, the industry regularly increases the power of its supercomputing clusters, enabling both faster imaging on existing algorithms and the practical implementation of more accurate imaging. However, raw supercomputing power is not enough. It is equally important—if not more so—to implement algorithms efficiently based on a detailed knowledge of the hardware.
The Reverse Time Migration (RTM) algorithm (see, e.g., Gray, S. H., Etgen, J., Dellinger, J., Whitmore, D., Seismic migration problems and solutions, Geophysics 66, 1622, 2001) is widely used in the industry because of its superior imaging accuracy for difficult subsurface structures like salt domes which are poorly imaged by other algorithms but which are very effective at trapping oil and gas. Several variants of RTM exist with differing degrees of approximation to reality, all of which use single-precision arithmetic.
In seismic imaging techniques has processed shot gathers individually on a single compute node so that they could be processing in parallel. This approach has many benefits; but for algorithms like RTM, as computational power increases, data input and output (I/O) becomes the single largest performance bottleneck, particularly when the model is large.
The RTM algorithm arises from the observation that pressure waves should be correlated at reflection boundaries; so RTM proceeds by correlating two pressure waves (called the forward and backward waves) to find those boundaries. To generate the waves for correlation, RTM simulates wave propagation using the wave equation for a wave that includes: generating a forward wave calculation Ps(x,y,z,t) from equation 1) given a source data term S(x, y, z, t) representing the source “shot” or sound wavefield 209 as shown in FIG. 2,
                                          [                                                            ∂                  x                  2                                ⁢                                  +                                                            ∂                      y                      2                                        ⁢                                          +                                                                        ∂                          z                          2                                                ⁢                                                  -                                                      1                                                                                          v                                2                                                            ⁡                                                              (                                                                  x                                  ,                                  y                                  ,                                  z                                                                )                                                                                                                                                                                                                    ⁢                              ∂                t                2                                      ]                    ⁢                                    P              S                        ⁡                          (                              x                ,                y                ,                z                ,                t                            )                                      =                  S          ⁡                      (                          x              ,              y              ,              z              ,              t                        )                                              1        )            and generating a backward (or reverse) wavefield calculation PR(x,y,z,t) simulating backward wave propagation 204 using the wave equation from equation 2) given the receiver data R(x, y, z, t) collected by the receiver arrays 212 according to:
                                          [                                                            ∂                  x                  2                                ⁢                                  +                                                            ∂                      y                      2                                        ⁢                                          +                                                                        ∂                          z                          2                                                ⁢                                                  -                                                      1                                                                                          v                                2                                                            ⁡                                                              (                                                                  x                                  ,                                  y                                  ,                                  z                                                                )                                                                                                                                                                                                                    ⁢                              ∂                t                2                                      ]                    ⁢                                    P              R                        ⁡                          (                              x                ,                y                ,                z                ,                t                            )                                      =                  R          ⁡                      (                          x              ,              y              ,              z              ,              t                        )                                              2        )            
The forward wavefield computation corresponds to the waves 202 generated from the air cannon firing and propagating forward in time using a “velocity model” represented by V(x,y,z), which specifies the wave velocity at each point in space and represents the various material properties and boundaries of the volume being imaged. The air cannon firing is treated as a wavelet impulse localized in time and space. As shown in FIG. 2, the backward wavefield computation is generated by using the shot data recorded (received) by the hydrophone array as the source term for the wave equation and propagating that as backward propagating waves 204 backward in time. These two waves are then multiplied point-wise at each time step to generate an image, using the following “imaging condition” as shown in equation 3) as follows:I(x,y,z)=ΣtPS(x,y,z,t)PR(x,y,z,t)  3)where PS(x,y,z,t) is the reflected forward power pressure wave at a coordinate and PR(x, y, z, t) is the reverse power pressure wave at an x, y, z coordinate at a time t.
This process is repeated for all shots in the seismic survey and the images generated are summed to create a final image of the reflecting boundaries, which represent the subsurface structure. It is important to note that the time summation in the imaging condition implies that the first time step of the forward wave needs to be correlated with the last time step of the backward wave. This constraint is typically handled in one of two ways: either the forward wave is saved to disk (called a “snapshot”) every several time steps and read in for imaging when the backward wave is computed, or the forward propagation is run twice—once forward in time and once in reverse time using boundary data saved from the forward pass to recreate the forward pass in reverse—and then imaging proceeds with the backward wave and the reverse forward wave.
As shown in FIG. 3, a velocity model V( ) 225 is depicted with each sub-block 28 representing a sub-set of the model data (e.g., sub-domain) corresponding to a single shot that is stored in a corresponding disk. For calculating an image, the entire model is not needed, e.g., a sub-set of the model is used for a selected shot. A master processing node 30 in communication with a memory storage device, e.g., database 235, is programmed to allocate a RTM compute cluster 240 including a plurality of slave nodes 245, each node 245 having a corresponding memory storage device 250. Each slave node 245 is in communication with the master node 230 and is configured to obtain and process the single shot data 228a. When a (forward or reverse) pressure wave and resulting image is calculated at a respective node, it is stored to the main memory storage disk 250. Memory storage devices 250 that store intermediate results data in calculating the forward or reverse pressure wave which values are used to calculate resulting image are referred to as “Scratch” disks. RTM compute clusters 40 have significant per-node scratch disk requirements for saving snapshot data, which for a 10243 model and 10,000 time steps would require 40 TB of snapshot storage—per shot. In practice, snapshot sub-sampling is used to reduce both disk requirements and disk I/O bottlenecks; however sub-sampling results in image degradation and must be balanced with performance. Compression can be used to trade computation for disk I/O, however, lossless compression may not obtain required compression ratio and lossy compression will also degrade image quality.
The velocity model 225 is the representation of the velocity of the wave traveling at an x, y, z coordinate in the sub-surface of the earth, and, for purposes of description, is referred to as V2. As velocity model 225 is not known in the RTM process, an imaging condition enables the snapshot of the wave at a particular point in time after calculating forward wave and backward waves. For every value of t there is produced an image from an average of many shots. When averaging many shots, the image at a coordinate at a time t, i.e. I(x, y, z) is obtained via the imaging condition according to equation 3) above.
For the forward motion calculation, the velocity model v(x,y,z) is loaded, the pressure wave is loaded at time t and the previous pressure wave at time t−1, i.e., P(x,y,z,t−1), and the next pressure wave P(x,y,z,t+1) is computed (bootstrap) and is stored. This process is performed at each iteration (time step t). The wave field is a 3-D object and is very large (e.g., on the order of 10243 elements). Thus, for example, at four bytes of data for each calculation, this may amount to 4 GBytes of data for 1 time step of the algorithm for a single shot) and there may be thousands of time steps amounting to over 24 TB of data for a single shot which, when sent out to disk, is time consuming. Thus in practice, in an effort to reduce the data, the data is not stored or calculated at every time step, i.e., the calculated PS(x,y,z,t) is either stored or loaded to disk every Nth iteration.
Due to the advances in the compute systems, industries have started to adopt more and more computational intensive seismic imaging algorithms, such as Reverse-Time Migration (RTM) and Full Waveform Inversion (FWI), for oil exploring. These algorithms require thousands or even tens of thousands of iterations on large 3D models, typically 1000×1000×1000, to propagate the waveform in the forward time stepping loop. During the forward loop, each iteration or every a few iterations, the current intermediate wavefield (also called snapshot) needs to be saved. Then, in the backward time stepping loop, the corresponding saved wavefield will be read back to correlate with the current wavefield for imaging condition. Because of the large amount of the data and number of the snapshot wavefields to be saved, they typically are saved to disk. However, thousands of times of disk writing and reading of large data set time consume and the I/O of these snapshots becomes a performance bottleneck and makes the seismic imaging computation very slow.
Some prior art solutions additionally save only the boundaries of the wavefields. However, this solution requires recalculation of the entire wavefield when it is needed in imaging condition. It costs additional computation time roughly as the forward loop.
Another solution is to compress the data before saving it to the disk. Such solution requires compression and decompression time. Also, the lossless compression only has small compression ratio and lossy compression can have high compression ratio but causes distortion.
Therefore, there is needed improved methods for reducing disk I/O time, i.e., time required to write to disk intermediate wavefield compression data in both forward and reverse time paths, and improve the performance over the prior art solutions.