In high density Static Random Access Memories (SRAMs), redundancy is widely used to improve fabrication yield. Redundancy basically involves adding extra word lines and/or extra bit lines to at least one memory sub-array.
As known by those skilled in the art, a memory array includes a plurality of K sub-arrays, each including N pairs of bit lines arranged in columns with memory cells connected therebetween. In a 1-bit wide organization (i.e. 1 output data available), a 1 bit common read bus passes through the whole memory array and the different sense amplifiers are dotted all together thereon. In this case, the bit line or column redundancy is easily implemented by adding a spare sub-array and dotting the redundant sense amplifiers on the common read bus. For instance, if up to 2 defective columns have to be corrected in one memory sub-array, 2 redundant columns have to be added in the spare sub-array. FIG. 1 illustrates this prior art approach.
As shown in FIG. 1, a conventional 1-bit wide memory circuit architecture 10 includes memory sub-array 11-0, which is represented in a typical bit wide organization, wherein the number N of columns is 64. A row of memory cells of sub-array 11-0 is activated for a READ operation by applying an appropriate voltage level on the word line (e.g. WL1), and simultaneously, only one pair of bit lines (e.g. BLT1, BLC1), forming one column over 64, is selected by applying a selection signal (e.g. BD-1) on the corresponding column sense amplifier 12 (e.g. 12-1).
The binary content of the selected memory cell (e.g. MC11) is then read, i.e. the differential voltage appearing between the bit lines is sensed by the corresponding column sense amplifier. Note that the column sense amplifiers of the memory sub-array 11-0 will be referred to below as the normal sense amplifiers. Note also that the WRITE circuitry has not been shown or represented for sake of simplicity. Selection signals BD-1 to BD-64 include bit line (1/64) and memory sub-array (l/K) decode information as known by those skilled in the art. All the output terminals of normal sense amplifiers 12-1 to 12-64 are dotted to a 1-bit common read bus 13. Assuming the sense amplifiers are of the differential type, the common read bus 13 comprises two wires, for the true and complement output signals, as illustrated in FIG. 1. If the sense amplifiers were of the single ended type, only one wire would have been required. A final common sense amplifier and off-chip driver block 14 is connected to the common read bus 13 to generate the 1-bit final output data.
Redundancy is made apparent from FIG. 1 by the presence of a spare sub-array 11'-0 comprised of N'=2 redundant cell columns that are connected in a standard manner to two redundant sense amplifiers 12'-1 and 12'-2 specifically dedicated to redundancy and controlled by selection signals RE-1 and RE-2. The outputs of these redundant sense amplifiers are dotted to the common read bus 13. As a result, all single (or double) failed cell column in memory sub-array 11-0 can be repaired, by substituting one (or two) redundant cell columns to the failed one(s). However, care must be exercised, because normal data and redundant data, respectively generated by the normal and redundant sense amplifiers, are output on the same common bus 13. As a matter of fact, once redundant data must be selected, the normal sense amplifiers of the defective memory cell column must be disabled in order not to mix the respective electrical signals on the common read bus 13. The switching operation between the normal and redundant sense amplifiers is realized by means of the bit (or column) decoding circuitry and also by means of a failed bit address detector or bit address comparator. The bit decoding circuitry is implemented with known multi-layer decoding techniques. A memory chip having no redundant cells is distinct in that one of these decoding layers can be inhibited by a signal which is provided by the bit address comparator. The latter recognizes the occurrence of a failed address previously tested and stored as explained below in a ROM using laser blown fuse techniques. The failed address is programmed (or written) with the help of an on-chip fuse for each address bit weight. Whether or not this fuse is blown depends on the binary identification of a given bit address which may be tested and recognized to exhibit a fail symptom. When all column address bits match with the programmed failed bits, the compare function (XOR function) is done in parallel on each bit and, in that case, a selection signal is generated and referred to as the RE (Redundancy Enable) signal to enable the appropriate redundant sense amplifier, and the corresponding BD signal is disabled.
The byte-wide organization is directly derived from the 1-bit wide organization. In a byte-wide organization, eight output data bits (i.e. a byte) instead of one are generated. Thus, a 8-bit common read bus passes through the whole memory array. The normal sense amplifiers are dotted on this common read bus with respect to their respective bit weight. If a fail occurs in one column of the memory array, that means one bit failed in a byte. The immediate solution (and the easiest from a circuitry point of view) would be to replace the defective byte by a redundant byte. Consequently, to be able to correct two defective columns in the memory sub-array, sixteen redundant columns are then required in the spare array. Such a byte-wide memory circuit architecture is illustrated in FIG. 2, and referred to generally by reference numeral 15.
Referring to FIG. 2, memory sub-array 16-0 is comprised of N=64 columns organized in eight blocks of eight memory cell columns each. A bit of a special weight corresponds to each block. The number of blocks M is determined by the length of the binary word that is selected to organize sub-array 16-0. In the instant case of a byte-wide organization, the word is comprised of eight bits, i.e. M=8. The number of memory cell columns per block, in this instance, eight, is purely illustrative. This number is related to the number of selection signals, and thus to the number of bytes stored in sub-array 16-0. The first eight normal sense amplifiers 17-1 to 17-8 of the differential type, are connected to the first eight memory cell columns of sub-array 16-0 and the outputs are dotted to the first sub-bus 18-0, which is comprised of two wires. These correspond to the first bit (bit 0 or LSB) of the byte to be output. Similar construction applies to other cell column blocks. To output one byte over eight of the memory sub-array, the normal sense amplifiers 17-1 to 17-8 are adequately selected by a selection signal BD-1 to BD-8 for each bit weight. As a result, the common read bus 18 comprises eight 2-wire sub-busses referenced 18-0 to 18-7 to generate the output byte in totality, whose bits are referred to as bit 0 (LSB) to bit 7 (MSB) in FIG. 2. To each sub-bus, e.g. 18-0, corresponds a final sense amplifier connected in series with a buffer to form an element, e.g. 19-0, of a block generically referenced with reference numeral 19. The full output byte is available at the output terminals of the eight elements 19-0 to 19-7.
In this instance, the spare sub-array 16'-0 is now comprised of 2.times.8=16 columns to be able to still correct two defective columns in memory sub-array 16-0. Spare array 16'-0 therefore comprises sixteen redundant sense amplifiers 17'-1 to 17'-16 that are dotted, two by two, on each of the eight sub-busses 18-0 to 18-7. The redundant sense amplifiers are controlled by two selection signals RE-1 and RE-2 to select up to two redundant bytes stored in the spare sub-array 16'-0 to be output by block 19. Note that, if binary words of 64 bits instead of 8 bits were used, this would mean that the spare sub-array 16'-0 should include 2.times.64=128 redundant columns.
The above problem of redundancy is even more acute in cache memories. The large performance gap between the central processing unit (CPU) and the main memory has made the use of cache memories an important factor for any present and future high performance processor. In cache memory architectures, the 4-way and 8-way set associative cache organizations are the most extensively used to date in order to meet the high speed objectives of the mainframe manufacturers. In the 4-way set associative cache organization, which is the standard to date, 4 data sets (bit, byte, . . . ) have to be generated by each memory sub-array and one over 4 of these data sets will be subsequently selected through the so-called late select signals LS0 and LS1. In the meantime, this provides a means of initiating additional logical functions (e.g. address translation) in parallel and then to decide later in the system cycle which data set is correct and must be used by the CPU. For example, assuming the data set format is the byte, 4 bytes have to be available at the same time and the final selection is done through a 1/4 multiplexer circuit controlled by four control signals derived from the two late select signals via a decoder circuit. Note that, in regard to the other address signals, these two late select signals are valid later in the cycle. As a result, a 32 bit common read bus now has to be used. If 2 defective columns still need to be corrected with the conventional byte-wide organization of FIG. 2, the number of spare cell columns will increase up to 32.times.2=64, as will now be explained in more detail in conjunction with FIG. 3.
Referring now to FIG. 3, a conventional byte-wide circuit architecture of a 4-way associative cache memory with embedded column redundancy is referred to generally by reference numeral 20. Memory sub-array 21-0 is comprised of N=64 memory cell columns sub-divided in M=8 blocks, which are connected to 64 normal sense amplifiers 22-1 to 22-64. The selection signals are referenced as SEL-1 and SEL-2. With this architecture, four memory data read busses 23-0 to 23-3, each comprised of eight 2-wire sub-busses are required to convey the four byte format data sets: SET0 to SET3. As a result, 32 bits out of 64 are output from sub-array 21-0, depending upon the selection made by selection signals SEL-1 and SEL-2. Still referring to FIG. 3, the two first normal sense amplifiers 22-1 and 22-2, of the first memory cell block, are dotted on the LSB0 sub-bus of bus 23-0, and so on up through normal sense amplifiers 22-7 and 22-8, whose outputs are connected to the LSB3 sub-bus of bus 23-3. Similar construction applies to other cell column blocks. All these connections, i.e. four groups 23-0 to 23-3 of eight 2-wire sub-busses, form main read bus 23.
In order to be able to still correct up to 2 defective columns in sub-array 21-0, the spare sub-array 21'-0 necessitates N'=64 columns. The dedicated redundant sense amplifiers are referenced 22'-1 to 22'-64. The two first redundant sense amplifiers 22'-1 and 22'-2 are connected to the LSB0 sub-bus of bus 23-0, and so on up through redundant sense amplifiers 22'-63 and 22'-64, that are connected to the MSB3 sub-bus of bus 23-3. Likewise, the selection signals are referenced RE-1 and RE-2.
The circuitry which will be described below is an example of a conventional design using standard logic circuits that could be implemented to render operative the memory cache circuit architecture of FIG. 3. All four LSB sub-busses LSB0 to LSB3 are connected via bus 24-0 to a 1/4 multiplexer element 25-0, and so on for each bit weight, until the MSBs are reached, as shown in FIG. 3. The eight 1/4 multiplexer elements 25-0 to 25-7 are grouped in multiplexer block 25. The multiplexer block 25 is controlled by a combination of the late select signals LS0 and LS1 via a 1/4 decoder circuit 26. Each one of the eight multiplexer element outputs is connected to a sense amplifier/buffer element 27-0 to 27-7, respectively, whose assembly forms block 27. The selected full byte is available at the output terminals of the elements 27-0 to 27-7. Selection signals SEL-1 and RE-1 are generated by circuit 28-1 comprised of conventional components: a RS flip-flop latch 29, a bit address comparator 30, a 3-way AND gate 31, and an inverter 32. The output of bit address comparator 30 is applied to the SET (S) input of latch 29. Signal BR-1 which is output by latch 29 is applied to one input of the 3-way AND gate 31 and selection signal BD-1 (as defined above with respect to FIGS. 1 and 2) is applied to its other input. The signal which is output by AND gate 31 is referenced SEL-1, while signal RE-1 is derived from signal BR-1 via inverter 32. A symetric circuit 28-2 generates signal BR-2 and selection signals SEL-2 and RE-2. Signal BR-2 is also applied to one input of the 3-way AND gate 31 of circuit 28-1. Similarly, signal BR-1 is applied to the corresponding 3-way AND gate of circuit 28-2. In such 4-way set associative cache SRAMs, wherein the bit addresses are only valid during a short period of time and usually are no longer valid when the late select address bits occur, the role of latch 29 is thus to "memorize" a given logic state when the bit address comparator 30 recognizes a faulty address combination stored therein. As known by those skilled in the art, bit address comparator 30 is fed with the column address bits from the CPU and compared with a table of defective addresses stored in a ROM. With standard memory architectures such as illustrated in FIGS. 1 and 2, latch 29 would not be necessary.
Although described in a 4-way set associative cache memory organization, the architecture of FIG. 3 could be generalized to a P-way set associative cache memory as well, wherein P=8, 16, . . . Note that in this case, the number of read busses and data sets is still equal to P. Finally, FIG. 3 also illustrates other memory sub-arrays, e.g. 21-1 to 21-K, and corresponding optional spare arrays 21'-1 to 21'-K.
Operating behavior of the cache memory architecture of FIG. 3 will now be briefly described. An electrical signal is developed on the 64 bit line pairs or columns corresponding to a given decoded row of the memory sub-array 21-0. 32 normal data out of 64 forming the four data sets SET0 to SET3 are selected by the normal sense amplifiers addressed by the appropriate selection signal i.e. either SEL-1 or SEL-2. The two following cases are to be considered depending upon whether or not redundant replacement is required.
In the first case, at least one selected cell column is defective. This implies that redundancy replacement is required. The bit address comparator 30 detects a match between the bit (or column) address provided to the memory by the CPU and the bit address stored as corresponding to a defective column of memory sub-array 21-0. The normal sense amplifiers corresponding to the 32 data wherein at least one faulty data has been recognized, are disabled by means of the BR signals (because latches 29 in circuits 28-1 and 28-2 are now reset), thereby preventing the corresponding selection signal (SEL-1 or SEL-2) to turn on any of them. Reciprocally, either the RE-1 or RE-2 selection signal is activated in order to turn on one of the groups of 32 redundant sense amplifiers to output 32 valid data. The reading of sub-array 21'-0 is achieved in the same way as for the reading of the memory sub-array 21-0. Finally, the LS0 and LS1 signals are delivered to the multiplexer elements 25-0 to 25-7, via decoder circuit 26, to select 8 data out of 32 to output the final byte.
In the second case, the selected cell columns in the memory sub-array 21-0 are free from any defect, and thus there is no need for redundancy replacement. Signal BD-1 (or BD-2) is not inhibited by the bit redundancy BR signal, because latch 29 has not been set. Next, 8 data out of 32, i.e. one data set among SET0 to SET3, are selected by means of the eight 1/4 multiplexer elements 25-0 to 25-7 driven by the 4 decoded LS0/LS1 signals to output the desired final 8 data. It must be noticed that with the implementation shown in FIG. 3, the redundant sense amplifiers are connected to the read busses 23-0 to 23-3. Therefore, they must be isolated therefrom when there is no defective columns in sub-array 21-0. This isolation is made through signals RE-1 and RE-2, which are forced to zero. Indeed, circuits 28-1 and 28-2 generate adequate RE-1 and RE-2 signals so that the two groups of 32 odd and even redundant sense amplifiers: 22'-1 to 22'-63 on the one hand, and 22'-2 to 22'-64 on the other hand, are inhibited.
The above implementation of column redundancy in a cache memory as described in conjunction with FIG. 3 exhibits some deficiencies.
A major inconvenience is the size of the spare array which increases from 16 (FIG. 2) to 64 columns (FIG. 3) in a 4-way set associative cache implementation. As a consequence, in that instance, the size of the spare array 21'-0 is the same as that of the memory sub-array 21-0.
Moreover, the number of redundant columns is dependent on the number of the late select signals. A 4-way set associative cache memory requires two LS address signals: LS0 and LS1 and thus four decoded LS address signals are generated at the output of decoder 26. Should P=8, i.e. a 8-way implementation adopted, this would imply three LS address signals and thus eight decoded LS address signals instead of four, the number N of cell columns in the spare array 21'-0 would then increase to 128. In all respects, the solution of FIG. 3 would become rapidly prohibitive from a consumed silicon area point of view and in terms of circuit complexity.
In addition, the decision to use the redundant or the normal 32 data depends on the content of an address whose 2 bits are the late select bits, that are determined only late in the system cycle. As explained above in conjunction with the first case, if redundancy is required, it is necessary to activate the redundant sense amplifiers, then to drive the multiplexer elements before the selected byte (normal or redundant) is output. Therefore, as the late select access time must be very short, it becomes more and more difficult to implement the column redundancy in cache memories without adding extra delay.
For the reasons outlined above, heretofore, there has been no known cache memory, which efficiently implements column redundancy. Consequently, only word line redundancy has been widely used so far in standard cache memory architecture.