The present invention relates in general to a method and apparatus for testing for bit errors in storage mediums, and in particular, in a RAM memory system.
A standard method, referred to as EDC monitoring, for recognizing faults of a storage medium of a RAM memory system during a normal operation is based on additionally co-storing a defined plurality of redundant bits under the address of the data word to be stored. These bits are referred to as check bits and are derived from the data word or from parts of the data word using a defined algebraic rule. Upon read-out of the stored word, check bits are again formed according to the defined algebraic rule and are compared to the read-out check bits for all bit locations. When there is equality for all check bits, it is assumed that the data word that was read out is error-free. For an inequality the type of error is identified from the pattern of the non-coincidence, referred to as a syndrome pattern. Those bit locations of the check bits that do not coincide are called syndromes. Certain syndrome patterns are decoded and the falsified bit position or falsified bit positions in the data word are identified as warranted and corrected by inverters.
The second formation of the check bits, the comparison, the decoding of the syndromes, as well as the correction and potential error notification are normally carried out in the prior art using specific LSI modules, that is EDC monitors.
FIG. 1 shows a specific algebraic rule, defining a modified Hamming code, with which the check bits are formed in a specific monitor module, for example a Am2960. As depicted in FIG. 1, data words are assumed that have 32 data bits. Seven check bits CX, CO, C1, C2, C4, C8 and C16 are allocated to these data bits, each monitor bit being respectively formed by parity formation from defined bit positions of a data word. The bit positions that are utilized for the parity formation of the respective check bit are marked with an X in FIG. 1. The type of parity that forms the foundation for the parity formation can be derived from FIG. 1. Thus, for example, the check bit CX is formed such that its binary value supplements the sum of the binary values of the bits 0, 4, 6, 7, 8, 9, 11, 14, 17, 18, 19, 21, 26, 28, 29 and 31 of a data word to an even parity. The check bit C2, by contrast, augments the bits 0, 1, 5, 6, 7, 11, 12, 13, 16, 17, 21, 22, 23, 27, 28 and 29 to an odd parity, etc. As a result of this rule, for example, the check bits 0001100 are allocated to a data word FFFFFFFF.
It is known to also utilize the above-described EDC monitoring for testing the storage medium for multiple bit errors.
The principle of such a test procedure is that the old test pattern previously written into the storage medium is read out using a read-write cycle, a check is carried out with reference to the check bits that are also read out and the next test pattern is subsequently written in. In order for addressing errors to produce a display of bit errors, a test cycle is respectively executed over a plurality of addresses of the storage medium before the method is continued with the next test cycle.
The comparison of the read-out check bits to newly formed check bits via the read-out data occurs on the basis of XOR formation. The result of this operation is the syndrome pattern that contains a 1 at the unequal bit positions and, as mentioned, corresponds to the type of error. When, for example, the specific algebraic rule depicted in FIG. 1 is used for the allocation of the check bits, then a syndrome pattern with an odd number of ones arises for the occurrence of a single bit error and a syndrome pattern with an even number of ones arises for the occurrence of a double bit error. It is assumed for the following comments regarding the prior art that, due to the negligibly low probability of errors of a higher order, multiple bit errors always appear as double bit errors. This assumption, for example, is reliably guaranteed when the storage medium is constructed with RAM modules having a word width of one bit and, thus, the simultaneous outage of two RAM modules would at most produce a double bit error.
The number of possible double bit errors derives from the plurality of selection possibilities of two bits from N bits, where N is the plurality of bits in a memory word including the check bits. There are in turn four possibilities of the occurrence of double bit errors for each of these possibilities:
1. First bit falsified to 0--Second bit falsified to 0 PA1 2. First bit falsified to 0--Second bit falsified to 1 PA1 3. First bit falsified to 1--Second bit falsified to 0 PA1 4. First bit falsified to 1--Second bit falsified to 1 PA1 1. TM 00000000 PA1 2. TM 11111111 PA1 3. TM 00001111 PA1 4. TM 11110000 PA1 5. TM 00110011 PA1 6. TM 11001100 PA1 7. TM 01010101 PA1 8. TM 10101010 PA1 (a) a first test pattern is written into the smallest addressable unit of the storage medium, the test pattern being composed of a data pattern and of a monitoring pattern, the monitoring pattern being formed from the data pattern according to a defined algebraic rule; PA1 (b) the first test pattern is later read out in turn, and a monitoring pattern is again formed from the read-out data pattern according to the same algebraic rule; PA1 (c) the monitoring pattern formed anew is compared to the read-out monitoring pattern, so that an occurrence of bit errors lying below a defined order and lying above a defined order are recognized; and PA1 (d) method steps (a) through (c) are repeated with a second test pattern that, given prior absences of a bit error, is completely inverted by comparison to the first test pattern or, respectively, given the prior occurrence of a number of bit errors lying below a defined order, the inversion of the monitoring pattern is modified such that the effect of the bit error(s) of this number continues to remain for the repeated implementation of method steps (a) through (c) and a combination of bit errors is thus recognized in method step (c) that is derived by means of an addition of bit errors that separately occurred in the first and second test patterns.
The number of overall double bit error possibilities thus amounts to N* (N-1)*2.
Two known testing procedures that are based on this testing principle shall be set forth below.
In the testing procedure, a first test pattern is first written in via all addresses of the storage medium, this test pattern is later read out in turn in a second method step and is monitored for single bit errors or double bit errors using EDC monitoring. Given the appearance of a single bit error, the affected address is tested with a second test pattern that has its data bit part completely inverted as compared to the first test pattern. When a single bit error occurs again, the double bit error is considered detected. When no further single bit error occurs, the same procedure is repeated with a third test pattern that has its check bits inverted.
The described, first testing procedure can be implemented in two versions. In the first version the appertaining address is stored given the occurrence of a single error in order to draw conclusions about the presence of a double bit error given the appearance of a further single error in a following test step. The disadvantage of this version of the first testing procedure is that it requires a corresponding amount of memory locations for the occurrence of many single bit errors.
In the second version the test cycle is interrupted given the appearance of a single bit error and the complete method is implemented at this address. Although the storing of the appertaining address is thus saved in comparison to the first version, time-consuming interruptions are necessary. This method is thus extremely time-consuming given the appearance of many single bit errors (for example, when the error affects an entire memory module).
In the second testing procedure the test is constructed such that every combination of two single bit errors appears as a double bit error in at least one test pattern TM. An example for 8 bits follows:
The test patterns TM are determined on the basis of continued, meander-like subdivision. Their number given N bits is (1d N+1)*2, whereby 1d is the logarithm for base 2. This minimum number of test patterns is further incremented when it is taken into consideration that the check bits must also be involved in the test and are dependent on the data bits. The second testing procedure thus requires a relatively large number of test patterns or, respectively, test runs.