1. Technical Field
The present invention relates in general to the field of computers, and in particular to memory devices. Still more particularly, the present invention relates to a method and system for self-healing a memory module, which has multiple memory sub-modules, by removing a portion of one of the multiple memory sub-modules from service.
2. Description of the Related Art
A key feature of modern computers is modularity. That is, with few or no tools, a computer owner can replace different components inside the computer's enclosure. By simply snapping in new components into existing sockets and/or cables, a non-expert user can install and/or replace his computer's hard drive, a wireless modem, and even a main processor. One of the most common components to be replaced, usually due to failure, is the computer's system memory.
Early computers primarily used Static Random Access Memory (SRAM) for system memory. While SRAMs are still in common use, particularly where memory speed is essential, they have some drawbacks. For example, SRAM draws a significant amount of power when in stand-by mode, and thus is not very useful in a battery powered device such as a laptop computer or a Personal Digital Assistant (PDA). Similarly, SRAMs are physically large, thus making them difficult to use in small computing devices such as PDAs, as well as in any other computer system, including servers, in which space is limited. Finally, SRAMs are relatively expensive, especially in comparison to Dynamic Random Access Memory (DRAM).
DRAMs use a network of storage cells that are each made up of a transistor that is under the logical control of a capacitor. Since capacitors tend to lose their charge quickly, DRAMs must refresh the storage cells (replenish the charge to the capacitors) every few milliseconds. Nonetheless, DRAMs draw less operational current than SRAMs.
As noted above, in modern computers, system memory is packaged to be easily installed and/or replaced. A common type of easily installed system memory comes as a package known as a Single In-line Memory Module (SIMM). Within the SIMM are multiple memory sub-modules of Dynamic Random Access Memory (DRAM) memory. Each memory sub-module is typically referred to simply as a DRAM.
Another popular type of replaceable system memory is a Dual In-line Memory Module (DIMM). A DIMM is similar to a SIMM except that a DIMM has DRAMs on two sides of an interior of the DIMM, rather than on just one side (as is found in the SIMM). By having memory on both sides of its interior, the DIMM obviously can hold more DRAMs and thus more memory is available to the computer.
A significant problem with DRAMs is that they are somewhat prone to failure. For example, consider a DIMM 100 shown in FIG. 1a. DIMM 100 contains n-number of DRAMS 102. If one or more of the DRAMS 102 fails, then the entire DIMM 100 can be snapped out and replaced. Alternatively, if only one of the DRAMs 102 (e.g., DRAM 102-1) should fail, then the failed DRAM (102-1) can be taken out of service, and DIMM 100 is able to continue to function at a reduced level. While taking only the failed DRAM 102-1 out of service rather than the entire DIMM 100 is a savings improvement, removing the entire DRAM 102-1 from service is still wasteful, particularly if only a portion of the DRAM 102-1 is actually defective. For example, assume that, as shown in FIG. 1b, only one of the columns of storage cells (column 1) is defective. By taking DRAM 102-1 out of service, then the rest of the columns that are still good (columns 0 and 2-7) are now wasted.
What is needed, therefore, is a method and system for reclaiming a use of a second portion of a memory sub-module (such as a DRAM in a DIMM) when a first portion of the DRAM fails. Preferably, such a method and system will be automatic to allow the DIMM to be self-healing.