Modern dynamic random access memory (DRAM) is used in most personal computer systems and servers today due to its low cost, high density, and random access times. DRAMs are based on small memory cells that store charge in capacitors to indicate the state of the memory cell. Capacitive storage is dynamic and the capacitors lose their charge over time. Thus the memory cells have to be periodically refreshed. In addition, a read operation is destructive because it drains the charge on the capacitor. Before an access to a memory location on a particular row, the row is “activated” by storing the contents of the row in a large page buffer that may be, for example, eight kilobytes (8 kB) in size. Before reading or writing a memory location in another row, the memory row currently in the page buffer must be “precharged” by rewriting the page buffer contents back into the memory cells along the row, which charges the capacitors back to their original states.
Because of their small size, DRAM memory cells are susceptible to soft errors. A soft error is a data error caused by the occurrence of a random electrical event, such as an alpha particle passing through the capacitor, electromagnetic interference, etc. Thus a soft error does not reflect any fundamental error or defect in the circuitry. In order to correct soft errors, memory manufacturers have adopted so-called error correcting codes (ECCs), usually by including one extra DRAM chip for each set of eight DRAM chips. ECCs are extra bits stored with the data that can allow, for example, the correction of a single bit error out of a group of bits, and the detection, but not correction, of a multiple-bit error. ECC allows correction of a single bit error because the ECC code contains enough information to identify the location of the failing bit so that the logic state can be inverted before the bit is rewritten to the memory array during a subsequent precharge operation.
Detection of soft errors using ECC bits is difficult in real time during read or write accesses. Thus memory controllers sometimes use “scrubbers” to perform background inspection of memory cells for soft errors. A scrubber periodically inspects a line of memory for ECC errors. If the scrubber finds a correctable error, it corrects the error, thereby decreasing the probability that the error would occur during an actual read or write access. The scrubber checks all memory locations in the entire physical memory space for such errors on a periodic basis, such as once per day.
On the other hand, memory cells occasionally experience circuit defects or “hard” errors that get worse over time until the memory cell or a set of adjacent memory cells fail. Conventionally DRAMs are tested at the factory to detect hard errors and are corrected by substituting redundant rows or columns for the failing rows or columns. However detection and correction of memory cells that become defective after manufacturing becomes more difficult. Typically memory is tested for hard errors at startup and the portion of memory that has experienced a hard error is removed from the system memory map. However if a hard error occurs after startup, running programs may crash, causing inconvenience or loss of data for the user. Moreover there are no known strategies to detect and correct hard errors that develop slowly over time and that do not result in program failure or lost data because of the availability of ECC bits.
In order to correct hard errors that arise after factory test, the double data rate version four (DDR4) memory specified by the Joint Electron Devices Engineering Council (JEDEC) has adopted a feature known as post-package repair. However there are no known systems able to use the post-package repair feature to correct hard errors simply and efficiently and prior to program failure or loss of data.
In the following description, the use of the same reference numerals in different drawings indicates similar or identical items. Unless otherwise noted, the word “coupled” and its associated verb forms include both direct connection and indirect electrical connection by means known in the art, and unless otherwise noted any description of direct connection implies alternate embodiments using suitable forms of indirect electrical connection as well.