The present invention relates to memory structures for computers, and more particularly, to error correction in computer memories.
Semiconductor memory systems are subject to errors. That is, data retrieved from the memory does not always match data that was originally written to the memory. Such errors can be caused by stray alpha particles, damage to the memory devices or by a variety of operating conditions, such as power supply fluctuations, noise, etc. Regardless of the source, such errors are clearly undesirable. Consequently, most modern memory systems include error detection and/or error correction capabilities.
Typical approaches to detecting and correcting errors in memory rely upon some form of error correction code to identify and correct such data errors. Such error correction codes typically include a mathematical algorithm that is applied to the data to be checked and corrected, and additional error correction code (xe2x80x9cECCxe2x80x9d) bits. Usually, the ECC bits are stored in a separate memory dedicated to the ECC bits. The amount of memory dedicated to storing the ECC bits can be significant. For example, the memory overhead for the ECC bits can often exceed 10%.
The amount of ECC bits required can depend upon the type of error correction code being utilized. In some applications, very little or no error correction is desired. For example, in video games, occasional image data errors are unlikely to significantly affect the images perceived by a user. Rather than devote processor power to error correction calculations and memory to ECC bits, such applications largely ignore image data errors to increase the speed of play. Such applications will be referred to herein as error tolerant applications. Error tolerant applications typically use no error correction calculations or limited error correction algorithms that require little or no ECC memory.
Other applications can tolerate little or no data errors. For example, data errors can be extremely undesirable in accounting programs. Such applications will be referred to herein as error intolerant applications. Error intolerant applications usually utilize robust error correction algorithms requiring a substantial amount of ECC memory.
Typically, memory devices for storing ECC bits are segregated from memory devices for conventional data. For example, 144 pin 4-MBxc3x9764 double in-line memory module (xe2x80x9cDIMMxe2x80x9d) not used to store ECC bits could be implemented using 16 4-MBxc3x974 dynamic random access memories (xe2x80x9cDRAMsxe2x80x9d). However, the same data storage capacity plus the capacity to store ECC bits would require a 4-MBxc3x9772 DIMM implemented using 18 4-MBxc3x974 DRAMs. Thus, implementing ECC requires two additional DRAMs.
One problem with such memory architectures is that they do not fully utilize the available memory capacity. For example, error tolerant applications do not need nor use the extra memory provided to store ECC bits. Thus, valuable memory capacity is left unused. In,the above example, 11% of the DRAMs on the DIMM are wasted when the DIMM is not used to store ECC bits.
On the other hand, error intolerant applications require more memory and are often limited by the amount of available ECC memory. Consequently, the speed with which the application runs can be increased by increasing the amount of available ECC memory. Adding such memory can be costly. Moreover, adding such memory capacity increases the amount of unused memory in error tolerant applications.
A software or hardware controlled reconfigurable memory system includes an auxiliary section of one or more data banks that can be selectively utilized as conventional memory or ECC memory, depending upon the particular application. In one embodiment, the auxiliary section is part of a memory module that includes a primary section directly coupled to an output data bus for conventional memory uses. A primary multiplexer selectively couples the auxiliary section to either the output data bus or to an error checking circuit, depending upon the selected configuration of the system. If the system runs an error intolerant application employing a robust error correction algorithm, the auxiliary section is coupled to the error correction circuit to store ECC data for ECC calculations. In error tolerant applications not requiring error correction, the auxiliary section is coupled to the output data bus to supplement the conventional memory, thereby providing increased memory capacity and improving speed of the system.
One embodiment of the invention also includes a dedicated ECC memory, which could be located on the motherboard. A secondary multiplexer receives data from the dedicated ECC memory at one input and data from the primary multiplexer at a second input. The primary and secondary multiplexers are controlled by software or hardware to establish the amount of ECC memory being used. For error intolerant applications, the primary multiplexer is activated to couple data from the auxiliary section to one input of the secondary multiplexer. The secondary multiplexer is then activated to couple data from both the primary multiplexer and the dedicated ECC memory to the error correction circuit. Thus, the auxiliary section is used to supplement the dedicated ECC memory in error intolerant applications where additional ECC memory is desirable.
In one embodiment, the second input of the secondary multiplexer is coupled to a set of memory sockets on the motherboard. The secondary multiplexer selectively couples only those sockets containing memory chips to the error correction circuit. Also, the primary and secondary multiplexers are controlled to select an appropriate portion of the auxiliary section to supplement the dedicated ECC memory, according to the ECC data requirements of an application and the amount of available dedicated ECC memory.
In one embodiment, the auxiliary section is segmented into two sections. The first section is used to supplement the dedicated ECC memory from the motherboard. The second section is used as a supplement to the conventional memory. To accommodate the difference in word length caused by segmenting of the auxiliary section, the second section is xe2x80x9cdouble-writtenxe2x80x9d and xe2x80x9cdouble-readxe2x80x9d so that data is written to and read from the second section in two or more pieces. When reading the data, the two or more pieces are combined to form the complete written data.