Many data processing tasks take a considerable amount of time and data processing resources to implement. In many instances, any such processing may take hours, days or, perhaps, even weeks. Often a cluster of data processing engines undertakes such data processing tasks. Each engine works on an atomic data processing task to produce corresponding results. Additionally, or alternatively, a massively parallel computer system can be used to perform such tasks with each processor within such a system being assigned respective atomic data processing tasks.
It can be appreciated that it would be undesirable if a machine performing such a task, or part of such a task, failed. Therefore, it is known within the art to implement a technique known as check pointing in which the data processing task is frozen or temporarily suspended while a complete back-up copy is created of the results, or partial results, thus far. Therefore, if a fault occurs, the previously saved results or partial results, also known as a memory image or picture, are loaded and processing recommences using those results or partial results rather than the whole job or data processing task having to be performed from scratch.
While the above described prior art technique is able to accommodate hardware or software failures that would, but for the technique, lead to a loss of data processing results and a need to restart the data processing task from scratch, it is undesirable to have to suspend the data processing operation while such a memory dump is performed. This is particularly so in the case where the data processing task is distributed across a number of machines. In such a situation all of the machines, in the worst case, or some of the machines interacting with a machine currently undertaking such check pointing, also have to suspend their operations. Therefore, check pointing undertaken by one machine might entail suspending operations on other machines.
It is an object of embodiments of the present invention at least to mitigate some of the problems of the prior art.