1. Field of the Invention
This invention relates generally to error correction of data stored in a computer memory module and especially to error correction on memory modules for correcting soft errors, and more particularly to the use of digital processing elements on memory modules to control scrubbing of soft errors in memory between read/write cycles of the CPU from/to the memory module.
2. Background Information
The use of error correction code (ECC) in systems is becoming more and more prevalent in computers as the size of memory and its sensitivity to errors increases. Error correction is accomplished by using an error correction code which generates check bits from the data written to memory and stores the check bits together with the data bits in memory. When the data bits and check bits are read from memory a new set of check bits is generated from the stored data bits and a comparison is made between the newly generated check bits and the stored check bits. For a single error correct (SEC) ECC, if any single bit errors are detected the error is corrected and in most cases all double bit errors can be detected but not corrected by the ECC algorithm. (Indeed some error correction code algorithms can correct any two or more bit errors, but single bit error correction is much more prevalent.) Moreover, add-on memory cards such as SIMMs or DIMMs are often structured so that they are capable of storing the data bits together with the check bits, but in many instances where error correction is not native to the CPU, error correction capabilities are provided on-board the SIMM or DIMM so that a computer which does not have native error correction nevertheless can have SIMM or DIMM add on cards which perform error correction of the data stored thereon.
In many cases the only time that error correction of the stored data bits takes place is when a read cycle is performed by the CPU. During the read cycle, the error correction code is utilized to correct single bit errors.
This technique while generally effective does have certain drawbacks. These drawbacks are encountered particularly when the memory modules may develop hard errors which align with soft errors which have been induced in memory. A hard error is a permanent error that cannot be fixed. A soft error is a temporary error which is fixed as soon as new data is written into the affected storage location. Hard errors result from manufacturing defects which manifest themselves some time after many cycles of operation in some function of the DRAM storage devices. Thus, memory which tests good after manufacturing may develop hard errors after installation and many cycles of operation. Such errors appear on every read cycle of a particular bit value, sometimes at many addresses. Such errors, as long as they are single bit errors, can be corrected on each read cycle to the affected address(es). However, if during storage a soft error should occur in some data bit or check bit and subsequently a hard error manifests itself which is aligned with the soft error, the result is a two bit error which in many cases cannot be corrected and thus causes an error signal. (Soft errors can occur due to several causes, one of which is stray radiation which can cause a bit to xe2x80x9cflipxe2x80x9d.) Thus the combination of soft errors occurring, which are random errors which can be corrected, aligned with hard errors, will cause the computer to either crash or malfunction.
In order to overcome this problem it is possible to xe2x80x9cscrubxe2x80x9d the data stored in the DRAMs periodically, i.e. the soft errors can be corrected periodically and thus if subsequent hard errors occur, these will be only one bit errors which can be subsequently corrected on a read cycle since the soft errors have been xe2x80x9cscrubbedxe2x80x9d or xe2x80x9cfixedxe2x80x9d and thus can not align with the hard errors. However, while some error correction native to computers or CPU""s have such scrubbing capability, this is not the case with all CPU native error correction.
According to the present invention, a memory module for attachment to a computer system having a memory bus and a method of using a memory module for error correction by scrubbing soft errors on-board the module is provided. The module includes a printed circuit card with memory storage chips on the card to store data bits and associated ECC check bits. Tabs are provided on the circuit card to connect the circuit card to the system memory bus. Logic circuitry is provided to selectively operatively connect and disconnect the memory chip and the memory bus. A signal processing element is connected in circuit relationship with the memory chips. The logic circuitry will selectively permit the signal processor to read the stored data bits and associated check bits from the memory chips, recalculate the check bits from the read stored data bits, compare the recalculated check bits with the stored check bits, correct all one bit errors in the stored data bits and stored associated check bits and re-store the correct data bits and associated check bits in the memory chips. When the memory chips and the memory bus are disconnected, this will allow single bit soft errors occurring during storage of the data bits and check bits to be corrected periodically before the data is read from the memory chips to the data bus on a read operation and thus reduce the chance of hard errors occurring and aligning with soft errors.