In large data processing machines, data is moved in and out of a set of memory arrays in order for the data processing machine to perform operations on the data. When the data in these memory arrays contains an error (which occur for a variety of reasons, for instance, intermittant failures in memory devices), that error must be detected and if possible corrected. Therefore, data processing machines often have error checking and correcting logic associated with the memory arrays in the machine.
Often the memory arrays in data processing machines are arranged into two major systems: a large-capacity main storage array or main store; and a smaller fast-access storage array called the buffer. The data processing machines operate by accessing data from the buffer which in turn retrieves currently active data from the main store. By using a fast-access buffer storage array the machine reduces the apparent main store access time for retrieving data.
Because the buffer is usually much smaller than the main store, the machine has the ability to move data out of the main store and into the buffer when the data requested is not resident in the buffer. Likewise, the buffer transfers inactive data out, and (if modified) moves it into the mainstore when the buffer is full in order to make room for currently active data being moved into the buffer.
Errors can occur while data is stored in the buffer, while it is being transferred to and from the main store, and while it is stored in the main store. Thus, data processing machines of this type have what is known as error checking and correcting logic associated with the data that is moved in and out of the memory arrays. Typically, the error checking and correcting logic is implemented by creating an error checking and correcting code, or ECC code, when data is moved into a memory array and then storing that code in the memory array. When the data is moved out of the array, the ECC code is then recomputed and compared with the code that has been stored. If the codes match, then the data is correct and the moveout proceeds. If the codes do not match, an error analysis is performed. If the error is correctable, then it is corrected. If the error is incorrectable, then the active process is normally interrupted by a machine check interrupt. During machine check processing that occurs after the interrupt it is desirable to be able to identify the damaged process. If it can be determined that the damage was confined to the process active at the time of the machine check, then only the damage process needs to be terminated. Otherwise a more catastrophic damage condition must be claimed.
In a multiprogramming environment, data moved into the buffer by the currently active process may displace buffer-resident data which may belong to another process. If the error checking and correcting logic detects an uncorrectable error on the move-out of the old data from the buffer and generates a machine check at that time, there has not been any easy way to identify the program with which the old data was associated. Therefore, the machine check must claim system damage leading to a more catastrophic interruption than would be necessary if the damage were confined to the harmed process.
Thus, there is a need for an apparatus by which to defer the machine check that would normally occur when an uncorrectable error is detected upon the move out of data from the buffer until the data is accessed from the main store by a program with which it is associated so that a lesser error condition can be generated by the machine.