Computing jobs can be processed simultaneously on several processors in a distributed processing system. Distributing the processing load of a computing job among several processors results in a shorter processing time for the computing job than if the computing job were processed on fewer processors. Typically, a computing job is divided into segments, and each of the segments is processed separately. The segments may be processed by different processors running on different machines.
Reasons vary for pausing computing jobs and resuming the jobs at a later time. For example, some computing jobs require lengthy processing time to complete, even when processed in a distributed processing system. Pausing a computing job would free up computational resources for processing other jobs with higher priorities. Some computing jobs are paused because they are running on machines which need to be taken offline. Other computing jobs are paused so that they can be resumed on machines with faster processors.
For certain applications, it may not be possible to pause a computing job and to resume the job at precisely the point at which the job was paused. The extent to which a paused computing job can be resumed without losing any work depends on the nature of each individual client application running on the distributed processing system. While some applications can pause and resume a distributed computing job without losing any work, other applications require a full restarting of a whole computer job. Still other applications lose only the work from partially-processed segments, while not losing any work on segments that have been fully-processed.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.