The present invention relates to failure recovery processing for data processing system and more specifically to failure recovery processing for supercomputers such as vector pipeline processors.
Supercomputers are capable of processing data at speeds one order of magnitude higher than the speed of operation of general purpose computers and have been widely used for scientific calculations for research and development projects. Supercomputers are usually implemented with two basic considerations. One is a technique for reducing the processing time for a given amount of input data. This is specifically accomplished by shortening the clock cycle. The clock cycle of supercomputers has been reduced year by year in comparison with general purpose computers, and a recent version of supercomputer is able to provide as many as sixteen 64-bit registers for executing floating point calculations at 2.9-nanosecond clock intervals. The speed of this value is one order of magnitude higher than the speed of general purpose computers.
The second consideration relates to the reduction of the amount of access to the main memory. Since a large volume of data are handled by supercomputers during each process, frequency accesses to the main memory impose limitations on the operating performance of the supercomputer. To this end, registers are provided in massive quantities to store data to which the access time can be reduced significantly in comparison with that taken to access the main memory. For efficient utilization of such registers, they are made to act as "software visible" registers which can be programmed.
Since the supercomputer has a small machine cycle and a massive quantity of software data, it is impractical to hold data stored in the software visible registers to effect instruction retry and processor relief actions in a manner similar to the fault recovery procedure taken by general-purpose computers. Therefore, a failure in any part of the supercomputer might result in a total system breakdown, causing all jobs in progress as well as succeeding jobs to be aborted.