In a computer system, various approaches may be used to ensure that data stored in the memory is accurately retrieved from the memory. One approach which has been in wide commercial use for many years is the use of parity. In particular, as the data is stored in the memory, one or more parity bits are generated in a known fashion as a function of the data, and are stored with the data. Then, when the data is subsequently retrieved from the memory, the data is used to regenerate one or more parity bits according to the same function, such parity bit(s) then being compared to the parity bit(s) stored with the data. If the compared bits are identical, it is assumed that the data has been accurately retrieved. On the other hand, if the compared bits are not identical, an error has occurred.
Memory errors can be broken down into two categories, in particular, hard errors and transient errors. Hard errors are those which are permanently present, and can be easily detected. Most computers are designed to do a brief check of random access memory when they are turned on, and they will typically detect hard errors during this test and subsequently avoid using the portions of memory which produced errors. Thus, hard errors do not usually present a serious problem.
Transient errors, on the other hand, are errors which are marginal, and may come and go. They may, for example, occur only when certain specific patterns of data are stored in the memory. Transient errors frequently are not picked up by the tests which the computer does when it is turned on, and thus the portions of memory having these transient problems may be utilized by the computer with no notice that errors may occur. Of course, the parity detection schemes discussed above are usually capable of detecting the transient error when it occurs.
In conventional systems, the output line from the parity detection circuit is frequently coupled to an interrupt input of the central processing unit, so that processing unit is interrupted in response to the occurrence of an error and does not continue its processing using incorrect data. Traditionally, the interrupt promptly notified the processing unit of the error before the processing unit could lose track of the memory location it was accessing at the time the error occurred. However, advances in technology have significantly increased the speeds of processing units. High speed processing units like the Intel 80386 microprocessor frequently prefetch instructions and place them in a queue. By the time a memory error is detected and interrupts a processing unit operating in a prefetch and queuing mode, the processing unit may have forged ahead sufficiently during the interim so that it is not at all clear what address was being accessed at the time the error occurred. Thus, a problem is that it is difficult to determine which portion of memory caused the error, particularly in view of the fact that transient errors frequently cannot be easily made to repeat themselves.
An object of the present invention is to provide an arrangement which facilitates accurate detection of the specific portion of memory which produced an error even in an environment where a processing unit is operating in a prefetch and queuing mode.
Objects and purposes of the invention are met by providing an apparatus which includes a memory having a plurality of selectively addressable locations, a processing unit which is operatively coupled to the memory and can selectively address and read storage locations therein, an error detecting arrangement for detecting errors in the data being read from the storage locations, and a recording arrangement responsive to detection of an error by the detecting arrangement for recording address information corresponding to the storage location from which the erroneous data was read.