This invention relates generally to a high-density, high-reliability memory module with a fault tolerant address and command bus for use as a main memory that will achieve the degree of fault-tolerance and self-healing necessary for autonomic computing systems.
Memory modules are well known to the prior art and have been and are presently being used in practical applications such as in computers and other equipment using solid state memories.
Broadly speaking, currently available main memories offer bandwidths in the range of 1.6 to 2.6 GB/s, and although some memories provide for limited data path error correction most offer no means of any error correction targeting the interface between the memory controller and the memory subsystem. Furthermore, memory modules for server products and other higher-end computing systems usually include re-drive logic for address and command inputs, and clock re-synchronization and re-drive circuitry associated with the memory subsystems to permit these modules to contain higher memory device counts and to ensure accurate clock timings at each device on the memory assembly. Although these solutions provide systems with the ability to achieve the specified bandwidth objectives, the overall quantity and types of failures in the memory subsystem, outside the data path itself, has actually increased due to the added circuitry associated with each memory device. Simultaneously, as these computing systems are more widely utilized in business, many applications simply cannot accept periodic unplanned system outages caused by failed memory modules. Thus the emphasis and need for improved overall system reliability is increasing dramatically and requires a comprehensive system solution that includes both a high degree of fault tolerance and overall reliability. Further, a corresponding need for greater system memory density is also required to achieve the system performance and operation throughput required in modem business applications, as well as to maximize the return on investment by extending the utility of the system by offering memory density improvements.
The present invention provides such a comprehensive system solution that includes the capability of high memory density and a high degree of fault tolerance and the overall differentiated system reliability long desired in the server market.
Existing solutions have memory module density that is typically limited to 18 or 36 devices for each memory module—with this limit based on such elements as the memory device package size, the memory module physical dimensions, the re-drive capability of the buffer, re-drive or register device, the power dissipation of the completed memory subsystem and/or module, etc. Other possible fault-tolerant improvement methods such as memory mirroring, symbol slicing and extensive forms of fault rejection and redundancy, provide enhanced memory subsystem reliability, but, due to negative impacts such as increased cost, power, and reduced performance, have been considered only for niche applications where price is not of high importance as these subsystem quality enhancements are very expensive to implement. Therefore solutions suitable for the low or midrange server markets have not been available.
Consequently the industry has long sought a simple, relatively inexpensive and reliable solution that provides high memory density with differentiated product quality, that provides an adequate level of asset-protection that does not endanger the reliability of the system through the use of reduced-function memory assemblies and yet is cost competitive.