The present invention relates to the correction and reporting of errors in computer memories.
A solid state computer memory is typically implemented as a relatively large array of memory chips. By way of numerical example, a 32-megabyte memory can be implemented using commercially available 1-megabit dynamic RAM (DRAM) chips. The DRAM chips are typically organized in a 1-Mbit by 1-bit configuration. In terms of 4-byte words, the memory could be organized as eight banks of 32 chips each.
It is known to provide each word with a check field for an error correction code (ECC). If each 32-bit word has associated with it a 7-bit ECC, it is possible to determine unambiguously whether a single bit of the 39 bits has an error, and which bit that is. Thus, single-bit errors are correctable. The ECC also contains enough information to specify that a 2-bit error has occurred, but not enough to correct such an error. Additionally, the ECC allows the detection of a 4-bit dropout, but cannot correct that.
Although it is possible to correct single-bit errors at the memory unit, it is desirable to have the operating system maintain records of which memory chips needed correction. One approach is to generate an interrupt each time an error is detected. This is a simple scheme, but it is cumbersome since a hard error on one chip would cause an interrupt at every access. As a practical matter, the operating system, when faced with the onslaught of interrupts, would typically turn off the interrupt entirely. An alternative approach provides an extra memory chip with one bit dedicated to each memory chip in the array. When a bit on one of the memory chips fails, the relevant entry in the table is set. The operating system can then determine the status by reading the table. This is a practical scheme, but adds an extra level of complication and expense.