In the prior computer art data stored in memory is encoded with error correction codes and is checked by error detection and correction circuits to detect and correct errors in the stored information. Sometimes errors are correctable and other times errors are not correctable. The detection of correctable or non-correctable errors indicates that the memory has defective memory elements and is to be repaired. Non-correctable errors cannot be corrected and cause the system to stop using the defective memory. Correctable errors are corrected and the defective memory may still be used but an indication of the defect is given to the system processor. The defective memory must be repaired or replaced as soon as possible. In large memories located on several printed circuit boards the defective memory indications do not indicate where in the large memory the defect is located. To locate a defective memory board the system must be shut down and testing done to identify the circuit board with the defective integrated circuit memory chip. The board with the defective chip is then replaced. This procedure keeps the processor out of service for a period of time that is often unacceptable but must be tolerated anyway.
One prior art approach to keeping track of correctable errors in a large memory has been to create a counter in the memory. The counter is incremented each time a microprocessor detects a correctable error. However, if other circuit elements access the memory and a correctable error is detected the counter in memory is not incremented. In addition, although this prior art technique reports memory errors, it provides no information as to where in the memory the errors come from. Thus, the error count is of limited value. The computer system still had to be taken off line and extensive testing done to locate where error(s) are in the memory and replace or repair the appropriate memory boards. In addition, if the defective location in memory is where the defect counter is maintained, the problems are compounded.
Accordingly, there is a need in the art for apparatus to minimize system down time when there is a defective memory. In addition, there is a need in the art for apparatus that detects which printed circuit board in a multi-board memory is defective and identifies that board.