1. Field of the Invention
This invention relates to an error recovery system using parity to detect errors and to recover from such errors.
2. Description of the Related Art
Many computer systems, invoking the use of the instruction set as set forth in International Business Machine Corporation's ESA/390 Principles of Operation, employ the concept of pipeline processing of instructions. In order to carry out a computer instruction a series of control words are processed by the computing system for each of the instructions within the instruction set. In a computer system using the pipeline concept of instruction processing, the instructions are broken down into a series of FLOWs where each FLOW contains a series of cycles and each cycle processes one control word. In some computing systems each FLOW is broken down into six cycles, a decode operations code cycle D, an address presentation cycle A, a translation cycle T, a buffer access cycle B, an execution cycle X and, finally, a write or store cycle W. In order to process the instructions faster, the FLOWs overlap such that different cycles in different FLOWs are being processed at the same time rather than sequentially where each FLOW would be completed before the start of the next FLOW.
Each instruction has associated with it the number of FLOWs necessary for the instruction to be completed. A D store is provided to store each control word for each D cycle in each FLOW of each instruction within the instruction set. In some systems the first D cycle of the first flow for each instruction is implemented in logic because of time constraints and therefore the first D cycle control word is not stored in the D store. An A store is provided to store a control word to control the processing of the A, T, B, X and W cycles in each FLOW of each instruction.
The occurrence of a parity error in the control word being read out of the D or A store must be immediately addressed. One approach used to recover from the parity error is for the system to cancel the instruction being processed, repair the erroneous control word in store and then perform a command retry of the instruction. Generally this recovery procedure is successful approximately 80% of the time, leaving 20% of the time where the instruction has not been successfully processed upon retry. Such a condition gives rise to a machine check which, according to where the failure occurred, could cause the computer to halt operations. Where the 20% failure rate is not acceptable, error correction codes have been used for detecting and correcting the error. This approach is costly in regard to the resources used in storing the error correction data, the time necessary for implementing the error detection and for correction when an error is so detected.
One characteristic of the A and D stores is that the control words stored therein are infrequently modified. Wide use has been made of the concept of back-up store of a main store such that when an error occurs in the main store, the back-up store may be used in place of the main store or to refresh the data in the main store. These systems have generally been used to maintain the integrity of data that is being manipulated by the computing system. In these systems, whenever data is modified, the modified data is stored in both the main and back-up stores to maintain an accurate copy of the data in the back-up store. Much attention has been given to methods for maintaining an accurate back-up copy of the data within a computer system.