Imaging systems, such as seismic imaging, magnetic resonance imaging (MRI), computed tomography (CAT) imaging, and X-ray tomography imaging systems and the like, typically involve the data acquisition, analysis, and interpretation of massive amounts of data. Generally, one or more sensors (sometimes thousands) collect raw imaging data that represent certain characteristics of an object. The collected imaging data are provided as input to imaging algorithms that reduce the massive amounts of imaging data into a much smaller representation, referred to as an output data set, of a physical object. The output data set is typically a 2- or 3-dimensional gridded data set wherein each grid point of the output data set represents characteristics about the object at a specific location in 2- or 3-dimensional space, respectively. The grid points are generally positioned at predetermined intervals, such as grid points at one meter intervals, one centimeter intervals, or the like.
For example, seismic imaging systems, such as a Prestack Depth Migration system, generally collect data regarding energy waves generated from an energy source that are reflected by various geological structures. The data collected by the sensors vary as a function of time and the positions of the energy source and the sensor collecting the data. Imaging algorithms operate on the collected data and generate a 2- or 3-dimensional representation of the geological structure.
Due to the large amounts of data, many imaging systems utilize parallel processing techniques in an attempt to reduce the time required to process the collected imaging data and to create the output data set. Generally, parallel processing techniques utilize a plurality of processing elements (PEs) operating on the collected imaging data. Each PE calculates a portion of the output data set, i.e., each PE calculating specific grid points of the output data set. After the PEs have calculated their portion of the output data set, the output of all of the PEs is combined to create the output data set.
The output data set produced by utilizing parallel processing techniques with a plurality of PEs, however, may not be reliable if one or more of the PEs fail while the output data are being calculated. Specifically, if a PE fails during the calculation of the output data, then “holes” or missing data will result when reassembly occurs. Recovery from the problem of missing data is generally accomplished by: (1) re-performing the entire analysis of the imaging data; (2) performing a subsequent task to recompute the missing portions of the output; or (3) re-configuring the job on the fly, always looking for PEs that become available after others have failed. Method (1) is the worst-case scenario, but if elapsed processing time is not critical, this is the least-effort method and is typically preferred. Method (2) requires configuring a subsequent computer job, which takes some human intervention, but will cause the elapsed time to be reduced because the small uncomputed portion of the job can be spread over many PEs to minimize the subsequent run times. Method (3) is the most difficult to code and maintain because it requires a monitoring node to constantly evaluate the state of all of the worker nodes and requires the monitoring node to re-assign tasks and re-apportion job segments. Method (3), however, is the most robust method.
Therefore, there is a need to provide a method and an apparatus to efficiently perform fault-tolerant parallel processing.