The invention relates generally to error checking and correcting. (xe2x80x9cECCxe2x80x9d) memory controllers and more specifically to a method of testing the error detection and correction capabilities of an ECC memory controller.
Historically, the most common method of protecting the integrity of memory devices in computers has been through use of parity schemes. While parity schemes provide the ability to detect single bit memory errors, they are not capable of correcting such errors. In contrast, error correcting and checking, or xe2x80x9cECC,xe2x80x9d technology provides the ability both to detect and correct single bit memory errors. However, because ECC requires expensive, specialized memory SIMMs, until recently, parity technology has remained the predominant memory protection method.
ECC employs additional bits called xe2x80x9ccheck bitsxe2x80x9d in which is stored information required to detect and correct single bit errors, as well as to detect, but not correct, double bit errors. The number of check bits required to protect a block of memory varies according to the size of the block. As illustrated in Table I below, for smaller blocks, parity requires far fewer additional bits than ECC; however, with the 64-bit data bus on certain commercially available processors, such as the Pentium Pro, available from Intel Corporation, ECC can be accomplished using the same number of additional bits as would be required for parity.
This explains, at least in part, the current popularity of ECC as a memory protection scheme. An exemplary ECC system is described in U.S. Pat. No. 4,358,848 to Patel, the disclosure of which is hereby incorporated by reference in its entirety.
State of the art memory controllers include ECC logic for generating check bits, or an ECC code, that correspond to a particular data value being written to memory and is stored in memory along with the data value. When data is subsequently read from memory, an ECC code is calculated for the read data and compared with the ECC code stored therewith by XORing the two codes. The result of the XOR operation, referred to as the xe2x80x9csyndromexe2x80x9d, if nonzero, indicates that an error has occurred.
During normal operation of a computer system, an ECC code will be encountered during every read from and write to system memory. For this reason, it is extremely important that ECC logic embedded in the memory controller be functioning properly. In the event that a single bit memory error is detected, the ECC logic reports and corrects the error. In the case of double bit memory errors, the error is not corrected, but is reported, by the ECC logic.
Many systems exist which use ECC technology to ensure the integrity of system memory. In contrast, very few systems exist for testing the validity of the ECC logic of the memory controller itself. Those systems that do exist, such as the system described in U.S. Pat. No. 5,502,732 to Arroyo et al., require the memory controller to be modified to include specialized hardware for testing the ECC logic embedded therein. Clearly, such hardware systems are deficient in that they fail to provide a universal system and method for testing the ECC capabilities of unmodified ECC memory controllers. The ability to test the ECC logic itself is important because if the ECC logic is faulty, the integrity of system memory may be incorrectly evaluated.
Therefore, what is needed is an improved method and apparatus for testing the operation of an ECC-capable memory controller that does not require hardware modification of the memory controller.
The present invention, accordingly, provides a system and method for testing the error detection and correction capabilities of an ECC memory controller that reduces or overcomes disadvantages and limitations associated with prior methods and systems. In a departure from the art, the system of the present invention is implemented entirely in software; accordingly, it can be used to verify the operation of nearly any ECC memory controller and requires no special hardware modification.
In a preferred embodiment, the invention takes advantage of the natural state of the bus to induce one- or two-bit memory errors as follows. First, the ECC generation capabilities of the memory controller are disabled, such that ECC check bits will not be generated for data read from and written to system memory. Next, a test data pattern that is one bit different than a data pattern that results in an ECC code equal to the natural state of the bus is written to a selected location in system memory. For example, assuming in its natural state, the bus is pulled low, a data pattern of 0000000000000000h would result in an ECC code equal to the natural state of the bus (00000000b); therefore, an appropriate test pattern would be 0000000000000001h. It will be recognized that the foregoing will result in a one-bit memory error being induced.
ECC generation capabilities are then reenabled, such that ECC codes will be generated and compared, by XORing the codes, on each read from and write to memory, at which point, the memory location to which the test data pattern was previously written is read and its ECC code generated. A determination is then made whether the memory controller detected and corrected the induced error and, if so, whether the memory controller reported the error.
Double-bit errors may be induced in a similar manner, it being understood that detection and reporting, but not correction, should be expected of a correctly functioning ECC memory controller.
If it is determined that the ECC memory controller is not functioning properly, either the memory controller or the entire motherboard of the computer must be replaced.
A technical advantage achieved with the invention is that it enables the operation of any ECC memory controller to be verified.
A further technical advantage achieved with the invention is that it is implemented entirely in software; therefore, no hardware modification of the memory controller is required.