FIG. 1 is a block diagram of a typical memory system 10 including memory 8 for storing data, where the term data includes any type of stored information including program instructions and data generated by or associated with such instructions. When the memory system 10 stores a data word 2, the data word is first presented to error correcting code (ECC) logic 4 before being written into the memory 8. The ECC logic 4 generates error checking and correction bits using the data word 2, and these additional error bits are then stored in memory 8 along with the data word 2. In the following description, the error detection and correction bits may be referred to as check bits, and the original data word 2 in combination with the check bits may collectively be referred to as a code word. The data word 2 and check bits are stored in specific locations in the memory 8 as programmed by redundancy logic 6 which redirects data to redundant storage locations in the memory to thereby replace defective storage locations, as will be described in more detail below. In this way, the redundancy logic 6 replaces defective storage locations to which data was initially directed with redundant storage locations, as will be understood by those skilled in the art. When data is subsequently read from the memory 8, the data is again presented to the ECC logic 4 to ensure the data as read is the same as the data word 2 initially stored in the memory.
The memory 8 is designed to maximize the number of bits available (storage capacity) without sacrificing too much memory speed (the time it takes to store or access the data). Thus, memory cells that store individual bits are packed as closely together as possible through a variety of different techniques, such as by reducing the number of transistors per memory cell and by making the transistors smaller. Typically, the smaller a memory cell the longer it takes to access the cell due to the small voltages and currents that must be properly sensed. Thus, there is a trade off in using more and larger transistors to increase the speed of the memory 8 but at the same time reducing the storage capacity of the memory. As a result, the memory system 10 typically includes a combination of relatively slow but high-capacity memory cells such as dynamic random access memory (DRAM) cells, and also includes lower-capacity but faster memory cells such as static random access memory (SRAM) cells.
An array of memory cells (not shown) includes a plurality of rows and columns of memory cells, with an address being associated with each memory cell in the array. In high-capacity arrays such as those formed from DRAM cells, the address is typically divided into a column address and a row address. The row address is typically sent first, and in response to the row address the data stored in an entire row of memory cells in the array is sensed and stored in circuitry in the memory 8. The column address is provided to the memory 8 after the row address, and selected ones of the memory cells in the addressed row are selected in response to the column address. If data is being fetched from a series of consecutive column addresses within the same addressed row of memory cells, the data stored in these consecutive columns of memory cells can be accessed from the circuitry that previously sensed and stored the data of the addressed row.
The memory 8 is typically manufactured with spare or redundant bits, and the redundancy logic 6 is programmed to substitute any defective memory cells with redundant memory cells. The redundancy logic 6 is typically programmed during initial testing of the memory 8. Referring to FIG. 2, the memory 8 of FIG. 1 is includes a memory array 12 of rows and columns of memory cells (not shown). The main approaches to the substitution of defective memory cells in the array 12 with redundant cells utilize laser blown fuses, electrical fuses, or one-time-programmable MOSFETs. Laser fuse based repair is still a common approach, although this type of repair increases test costs substantially since a 3-step test process of test, laser repair, and retest is required. Electrical fuse based repair can be performed as a single process using a tester which tests, electrically repairs, and retests while the memory 8 is coupled to the tester.
The repair process for substituting redundant memory cells for defective memory cells typically consists of identifying the proper laser programmable fuses, electrically programmable fuses, or one-time-programmable MOSFETs needed to deactivate a defective column 14 of memory cells, deactivating the defective column or group of columns containing a defective cell or cells), activating a redundant column 16 or group of redundant columns of memory cells, and configuring the redundancy logic 6 to assign the array address corresponding to the defective column 14 to the address of a redundant column 16. After the defective column 14 is disabled and the redundancy logic 6 programmed, whenever the defective column 14 is addressed the redundant column 16 will be accessed instead, allowing data to be read from and written to the memory cells in the redundant column 16. In this way, every time a subsequent read or write operation addresses the defective column 14, the redundant column 18 is accessed instead of the defective column. The circuitry, operation, and processes for redundancy programming to replace defective memory cells with redundant cells is well understood by those skilled in the art, and thus will not be described in more detail.
Modern computer systems typically contain hundreds of megabytes (MB) of memory for storing programming instructions and associated data. With so much memory now being contained in computer systems, the likelihood of defective memory cells has increased. For example, 128 MB of DRAM is a typical amount contained in present personal computer systems. Each byte of memory typically includes 8 bits and thus is stored in 8 individual memory cells. Accordingly, there are over 1×109 DRAM memory cells required to store the desired 128 MB of data. Moreover, these DRAM memory cells are typically accessed hundreds of millions of times per second. Given such a large number of memory cells and the frequency with which the cells are accessed, the probability that an error will occur in data being read from or written to the memory cells is fairly high.
As previously mentioned, the ECC logic 4 adds error bits to the stored data word 2, with the error bits being redundant information that allows errors in the data stored in the memory 8 to be detected and in some cases corrected. Referring again to FIG. 1, the ECC logic 4 performs error-correcting operations on data words 2 used by application programs (not shown) accessing the memory 8. In general, referring to FIG. 3 a typical embodiment of the ECC logic 4 is shown in more detail to describe the conventional way errors are detected and corrected. A data input signal DI, which corresponds to the data word 2 in FIG. 1, is a word M bits long and there are an additional K bits added to the word that are used to detect and correct data bit errors. An encode function 72 applies the algorithm used to generate or properly set the additional K bits based upon the original M bits. After encoding of the data word DI by the encode function 72, a code word formed by the M and K bits is stored in the memory 8. At some subsequent time, the code word or the M and K bits are read from the memory 8, such as by an application program, and the read M bits are presented to a buffer 80 in a corrector unit 78 and are also presented to an encode function 74, which is identical to encode function 72 and generates K bits based on the bit values of the read M bits. The compare unit 76 compares the K bits generated by the encode function 74 to the K bits read from memory 8. If the two sets of K bits have identical values the compare unit 76 signals the corrector unit 78 to provide the M-bits from buffer 80 without change as a data out signal DO. If, however, the compare unit 76 signals the corrector unit 78 that the two sets of K bits have different values, the corrector unit corrects the M bits in the buffer 80 based on a correction algorithm and thereafter provides the corrected M bits from the buffer 80 as the data out signal DO. The compare unit 76 also generates an error signal ES in this case, which is utilized by other circuitry (not shown) in the memory system 10 (FIG. 1).
The ECC logic 4 may execute a variety of different error detection and correction algorithms to correct errors detected in the stored code word. One common algorithm is an algorithm that utilizes a code known as a Hamming code, which is an error detection and correction code used in many different types of digital systems. An example of a Hamming code and the application of this code by the ECC logic 4 will now be described in more detail. Referring to FIG. 1, typical data words 2 are 8 to 64 bits wide and the ECC logic 4 typically applies a single-error-correction-double-error-detection (SECDED) algorithm to the data words, with this algorithm being implemented through a Hamming code. A Hamming code has what is known as a Hamming distance between sets of code words that collectively make up the Hamming code, where a code word is a data value combined with the error check bits generated by the algorithm. For example, in FIG. 3 each code word corresponds to M data bits and K error check bits stored in the memory 8 as a code word. The Hamming distance between code words is the number of bits by which the code words differ and determines the number erroneous bits that can be detected and corrected in a code word, as will be understood by those skilled in the art. Take for example the Hamming code made up by the two code words 0001 and 1000. These code words differ in two bits and thus have a Hamming distance of two. With a Hamming distance of two, any single bit error in a code word can be detected. If a single bit in either of these two code words 0001, 1000 changes, the resulting code word is different than either of these two words. This allows a single bit error to be detected since changing any single bit in one of the code words produces a code that is not one of the original two codes words. If two bits are changed in either of these code words, however, these errors may go undetected because if the right two bits change state then the one code word becomes the other. For example, if in the code word 0001 the first and last bits change logic state, meaning that the rightmost 1 bit becomes a 0 and the leftmost bit becomes a 1, then the code word 0001 becomes 1000, which is the other code word. Thus, two erroneous cannot be detected in all situations using this Hamming code having a distance of two.
If the Hamming distance for a code formed by a set of code words is at least 3, we can correct any single bit error in any of the code words. With such a Hamming code, a single bit error is just one bit away from a valid code word, and since every other code word is 2 bits away from the erroneous code word, we correct the error and get back to the only valid code word that is just one bit away. Of course, if multiple bits change state in a code word then this process may erroneously result in a single bit being changed to obtain the nearest valid code word even though this code word is not the original coder word. Thus, only single bit errors can be corrected. For a Hamming code with a distance of 3, however, we can detect whether two bits have changed state even though we cannot correct such errors. This kind of Hamming code having a distance of 3 and is called a Single Error Correction, Double Error Detection (SECDED) code.
As an example of a SECDED code, suppose our data in signal DI in FIG. 3 consists of 4 data bits=[1011] and M thus equals 4. This example requires 3 error check or parity bits in positions 20=1=P1, 21=2=P2, and 22=4=P4. The M data bits will be placed into positions 3, 5, 6, and 7 and designated D3, D5, D6, and D7 in a code word of this SECDED code so that each code word has the form [P1 P2 D3 P4 D5 D6 D7]. Each of the parity bits P1, P2, and P4 is set to a 1 or a 0 to ensure that each code word has a distance of at least 3 (i.e., at least 3 bits are different) from the nearest valid code word by ensuring each subset of M data bits used to calculate the parity bit has an even or odd parity. Odd parity means there are an odd number of 1s in the subset of M data bits and the respective parity bit P and even parity means there is an even number of 1s. Even parity is assumed in the present example.
In the example where data bits D equal 1011, inserting the data bits D3, D5, D6, and D7 into a code word produces [P1 P2 1 P4 0 1 1]. Calculation of the parity bits is as follows: P1=(D3, D5, D7)=(1 0 1). The parity in these three bits is even since there are two “1” bits so the parity bit P1 needs to be set to 0 to make the parity across (P1 D3 D5 D7) be even. Thus P1=0. The code word then becomes [0 P2 1 P4 0 1 1]. The next parity bit P2 is determined from the data bits P2=(D3, D6, D7)=(1 1 1), and since the parity in these bits is currently odd the parity bit P2 must be set to “1” so that the parity across (P2 D3 D6 D7) is even. The code word then becomes [0 1 1 P4 0 1 1]. Finally, the parity bit P4=(D5 D6 D7)=(0 1 1) so the parity bit needs to be set to 0 to give even parity across the bits (P4 D5 D6 D7). Thus, P4=0 and the final code word with the parity bits in place is [0 1 1 0 0 1 1]. A variety of different types of circuitry may be used in the ECC logic 4 to generate the parity bits P1, P2, P4, such as cascaded XOR gates as will be understood by those skilled in the art. The code word [0 1 1 0 0 1 1] is the word that is actually stored in the memory 8 in FIGS. 1 and 3, and the code word that is subsequently retrieved from the memory and processed by the encode function 74, compare unit 76, and corrector unit 78 of FIG. 3. In FIG. 3, the K error check bits correspond to the parity bits P1, P2, and P4.
Now suppose that upon retrieval of the code word from the memory 8, the code word has a value [0 1 1 0 0 0 1]. In other words, the bit in position D6 is in error. In this situation, the encode function 74 and compare unit 76 generate a check word (C4 C2 C1) from the retrieved code word where the bits in the check word are set to 1 if the parity check indicates a parity error for the corresponding parity bit. In the present example, parity bit P1=0 and a check of the retrieved bits (D3 D5 D7)=(1 0 1) has even parity so that bit P1 agrees with the retrieved parity so C1=0. A similar process is used to determine the values of bits C2 and C4. In this example these values are P2=1 and the parity check of the retrieved data bits (D3 D6 D7)=(1 0 1) has even parity. The bit P2 thus indicates that the stored data bits (D3 D6 D7) have odd parity which is not true, so an error has been detected and the bit C2 is set to 1. Since the parity check on the bits in the retrieved code word is in disagreement with the parity bit P2, the bit C2 is set to 1 to indicate the detected error. Exactly which bit in the code word is in error has yet to be determined.
Finally, the bit P4=0 and the parity check of the retrieved bits (D5 D6 D7)=(0 0 1) has odd parity indicating that the bit P4 should equal 1. The bit P4=0, however, so once again an error has been detected and the check bit C4 is set to 1 to indicate the disagreement in values. The check word [C4 C2 C1] therefore has a value of [1 1 0] indicating that an error has been detected in position D6 since the check word represents the bit pattern for the decimal number 6. As a result, the bit D6 will be inverted and in this case changed from a 0 to a 1. The key with this type of code is that the binary value of the check bits indicates the erroneous bit in the code word, as will be appreciated by those skilled in the art. The corrector unit 78 then removes the parity bits P1, P2, P4 from the corrected code word and the bits D3, D5, D6, D7 having the values 1011, which is the original data word portion of the original code word stored in memory, is output from the buffer 80. In the following discussion, the process of generating the parity bits P1, P2, P4 may be referred to as encoding data words and the process of generating the check bits C and detecting and correcting erroneous bits may be referred to as decoding data words.
Referring now to FIGS. 2 and 3, the ECC logic 4 typically processes code words having the number of bits or width used by a particular application, which is typically much less than the number of data bits read from the memory array 12 of FIG. 2 when a given row of memory cells (not shown) is accessed. The problem with using Hamming codes on relatively short code words is that a larger percentage of the overall storage capacity of the memory array 12 is required to store the parity bits generated for each code word. For example, an 8-bit data word (M=8) requires 5 parity bits to implement an SECDED code, and thus each 13 bit code word includes 5 parity bits meaning that approximately 38% (5/13×100%) of the storage capacity of the memory array 12 is required merely for storing the parity bits required to implement the code and thus that may not be used for storing data bits. In contrast, a 256-bit data word (M=256) requires only 10 parity bits to implement an SECDED code, meaning that only approximately 4% (10/256×100%) of the capacity of the memory array 12 is utilized for storing parity bits required to implement the code.
Because the percentage of overall storage capacity of the memory array 12 that is required for storing the parity bits decreases as the number of data bits in each code word increases, it is desirable for the ECC logic 4 to operate on data words having a large number of data bits, which will be referred to hereinafter as “wide” data words. When the ECC logic 4 processes wide data words the logic can be embedded “deeper” into the memory system 10, where the term deeper means closer to the memory 12 of FIG. 2. In a given memory array 12, the widest data word corresponds to the data word containing all the bits in an entire row of the array, as will be appreciated by those skilled in the art. A problem with having the ECC logic 4 process wide data words from the array 12, such as the word corresponding to an entire row of memory cells in the array, is that the ECC logic must process all data from the activated row, which may include data from valid columns of memory cells, data from defective columns, and data from redundant columns. There is no way of knowing prior to testing which columns of memory cells in the array 12 are defective and which redundant columns will be mapped to replace these defective columns. Conventional ECC logic 4 must therefore include circuitry to process only valid data from the array 12, which complicates the circuitry required to implement the ECC logic 4 and prevents such conventional logic from efficiently operating on wide data words.
There is a need for performing error correction and detection on wide data words in memory systems and other types of systems containing memory.