A volatile main memory for computer systems, which is referred to in shortened form as RAM memory module, is of great importance in modern computer systems. Advancing technology and the simultaneous fall in prices permit the use of memory modules with a multiplicity of individual memory chips fitted thereon up to a total capacity of 2 GB even in small personal computer systems. Larger storage capacities are continually being developed. In this context, the importance of test sequences and test methods for discovering possible errors is increasing since a failure on account of a hardware fault or else a design error, due to the high complexity of the memory chips used, cannot be ruled out from the outset in the design phase.
The multiplicity of computer programs and different applications generate data combinations and variants of access to the individual memory chips within the memory module that cannot be completely covered in the tests used for development or for production.
FIG. 3 shows a typical memory module such as is used in present-day computer systems, inter alia for servers and desktop systems and also for private personal computers. In this case, eight individual memory chips B1 to B8 are arranged on the memory module 1, said memory chips for their part containing a multiplicity of memory cells. The total number of memory cells in each memory chip yields the capacity of the individual memory chip. This is, by way of example, 8 MB, 16 MB or even 64 MB per memory chip. Each of said memory chips B1 to B8 is assigned a number of the pins P1 to P92 of the module 1. By means of corresponding signals on said pins, individual memory cells of the memory chips B1 to B8 are addressable, and data can be written to or read from the latter. A current and voltage supply for the memory chips B1 to B8 is likewise ensured.
The data required for a correct driving of the memory module by a memory controller are stored in a small auxiliary memory E1, which is referred to as an EEPROM or SPD-ROM. The data are read out by the memory controller prior to operation. The data stored in the ROM relate inter alia to the latencies of the individual memory chips, the burst rate and the read and memory access.
In order to avoid read errors in the memory cells of the individual memory chips, a further chip ECC is additionally provided on the memory module 1. Correction data are stored by the memory controller in this memory chip, which may indeed be the same type as the memory chips B1 to B8. With the aid of said correction data, the memory controller is able not only to detect a possible error in a memory cell of one of the eight memory chips B1 to B8 and communicate this to a processor of the computer system, but possibly also to correct the error. The additional chip ECC is also referred to as an error correction chip. It stores in its memory cells so-called check bits, which are used for detection of an error and correction thereof in one of the other memory chips. Various algorithms can be used for the generation of suitable check bits and the subsequent associated error correction methods. However, said algorithms can greatly influence the speed of the overall system and in particular of read and write processes in the memory and are therefore not usually published by the manufacturers of the memory controllers.
In a typical application example, the memory controller determines a checksum comprising 8 check bits from a total of 64 bits to be written to the memory chips B1 to B8 and then writes the total quantity of 72 bits to the memory module. The 8 check bits or checksum bits are written to the error correction chip ECC. When the memory cells of the chips B1 to B8 are subsequently read, the memory controller generates a checksum from the bits read and compares said checksum with the checksum from the error correction memory. The fact of whether one of the memory cells of the memory chips B1 to B8 is defective is determined in this case. If appropriate, the erroneous bit is corrected.
If an error occurs, the memory controller reports an error, an “ECC fail event”, to a main processor of the computer system. In the case of such a message, however, it is not possible for the processor or a test program to ascertain whether the error occurred within the memory cell of one of the memory chips B1 to B8 or within the correction memory chip ECC. A defective memory cell within one of the memory chips B1 to B8 can be determined by a comparison of the data read from the chips with reference data. Since the memory controller also returns the precise address of the respective memory cell within the memory chip, it is thus possible to determine the precise location of the defective memory cell within the memory chip.
In contrast to this, the memory addresses of the checksum in the error correction chip ECC are not reported further by the memory controller. Therefore, in the case of an error within a memory cell of the error correction memory ECC, the defective memory cell has to be determined by means of other methods.
One method would be, for example, to detach the defective error correction chip ECC from the memory module 1 and test it separately. In practice, however, it proves to be difficult to simulate and precisely identify the errors that occurred in practical application within the test system. This results from ignorance of the algorithm used for the generation of error correction data of a memory controller. During a test within a test system, both static and dynamic errors occur, which can be assigned to the specific error only with difficulty.
FIG. 5 shows another possibility. In this case, a test card TI is connected between the actual memory controller C in a computer system and the memory module 1 with a total of three memory chips B1, ECC and B2. The memory chip ECC forms the error correction chip and is provided, in a normal operating mode, when the memory module 1 is directly connected to the control circuit C, for receiving and outputting error correction data. When an ECC fail event occurs, the test card TI is incorporated between memory module 1 and control circuit C. This is done whenever the ECC fail event has been caused on account of an error on the error correction chip ECC. The test card is configured in such that a way that it directly routes through the corresponding data and signal lines from the control circuit C for the memory chip B1. The test card TI leads the control line S1, the data line D1 and also the supply line V1 directly from the control circuit C to the memory module 1 and the associated memory chip B1.
At the same time, the control lines and the data lines for the error correction memory chip ECC and the second memory chip B2 are interchanged. As a result, the error correction data from the memory controller C are not written to the error correction memory ECC actually provided therefor, but rather to the second memory chip B2. At the same time, data provided for the memory chip B2 are written to the error correction chip ECC. Since the addresses for the data are output on the lines S3 and D3 by the controller C, it is thus possible to precisely determine the defective memory cell of the error correction memory chip ECC.
In this case as well, however, the data which a memory controller writes to the error correction memory are not known in advance. In particular, in the case of this design, different data are written to the error correction memory ECC than is the case in a normal operating mode. As a result, an error that depends on the data content of the memory cells or electrical parameters might no longer be demonstrable.