Field of Disclosure
The present disclosure relates generally to a method of obtaining a fully parallelized solution of wave equations. More specifically, the present disclosure relates to a framework of achieving efficient utilization of multi-GPU computer architectures.
Description of Related Art
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent the work is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Wave propagation plays a central role in many fields such as physics, environmental research, medical imaging, acoustics modeling, solid state physics, seismic imaging and cardiac modeling. Different methods have been proposed for obtaining stable and accurate solutions of the wave equation. However, computational cost remains a major problem for most applications.
The most commonly used methods to solve the wave equation can be divided into finite-element methods, spectral element methods, and explicit and implicit finite difference methods. The finite difference method is especially suitable for graphical processing unit (GPU) acceleration, due to the simple division into independent operations. In such methods, the solution in a current time step depends only on solutions of the previous time step. Hence, all nodes can be computed in parallel. However, the numerical solution of the wave equation is a memory-demanding process since desired frequencies, model sizes, and wave velocities impose large grid sizes. Specifically, due to a limited amount of global memory that current GPUs are equipped with, most large scale applications require multiple GPUs to be deployed.
In applications such as those in the field of acoustics, where the model size rarely exceeds 100 meters and there is a desire to incorporate a large number of frequencies, a grid size of typically 22e6 nodes is required. Further, in seismic imaging applications where the model dimensions are often in the order of a few hundred kilometers in lateral and vertical extension, minimal wave velocities of 300 m/s and frequencies of 10 Hz impose a grid size requirement of approximately 16e9 nodes.
However, current GPUs have a maximum global memory of six gigabytes, and can therefore store around 1.6e9 single precision floating point numbers. Furthermore, the global memory of the GPU must store more that just the resulting array of frequencies. A typical solution to this problem is to distribute the workload and data to different GPUs. Specifically, one GPU is assigned to one specific sub-domain. This approach however tends to be inefficient as most GPUs remain idle during a large computation period. Accordingly, there is a requirement for a framework that achieves efficient utilization of multi-GPU architectures.