An SRAM semiconductor memory is typically comprised of a plurality of memory cells, each memory cell having, for example, four to six transistors. Generally, each memory cell is coupled to a column and row select line which is used to select the individual memory cell, and each memory cell receives its input and drives its output onto a pair of sense lines, typically designated sense and sense complement. For purposes of this description, this pair of sense lines (sense and sense complement) shall be called the bit line pair. To read each memory cell, the voltage differential on the bit line pair must be sensed. Reducing the voltage differential on the bit line pair to the minimum level needed to reliably sense the memory cell's content reduces power consumption in the SRAM.
FIG. 1 is a block diagram of the internal structure of a typical 1024 by 4-bit SRAM. The SRAM array 20 consists of four blocks 22 of 64 words by 16 bits each. During a read operation, the high-order 6 bits of the address (A4 through A9) select one of 64 words. Four groups of 16 bits each emerge from the storage array, one group for each of the possible data bits. The four low-order address bits (A0 through A3) select one of 16 bits from each of the four groups to form the 4-bit data word. Writes are similar, except with data flowing in the opposite direction.
This form of two-dimensional decode, with row and column decoders 24, 26, is used universally in memory components. Not only does it keep the memory array square, it also limits the longest lines in the decoders. Although the illustrated SRAM provides a 4-bit data word, the width of the data word is now more typically 16-bit or 32-bit, and 64-bit data word SRAM is also commercially available.
It is known to fabricate large SRAMs from a plurality of smaller, modules that each individually comprise a fully operational SRAM memory, such as the module 20 shown in FIG. 1. These individual SRAM units may be referred to each as a “bank.”
Although the memory cells of an SRAM do not need to be continually refreshed, as do those of a dynamic random access memory (DRAM), the number of transistors used for each cell to provide a single memory bit results in a large amount of integrated circuit (IC) area to implement a large SRAM. As SRAM also operates faster than DRAM, SRAM is typically used as a cache memory for the microprocessor, although SRAM cache memories are typically relatively small in size.
In addition to the issues associated with increases in size of SRAMs, microprocessor clock speeds have increased which have increased the clock frequencies of SRAMs. As recognized by the present inventors, the increasing size and speed issues of SRAMs has made the design of conventional SRAM memories problematic. For instance, a certain amount of time is need to drive the bit line pair differential voltage signal, the time being needed to allow for the needed voltage differential to propagate through the length of the bit line pair and reliably indicate the value in the memory cell. Known complementary metal oxide semiconductor (CMOS) fabrication, and operation techniques for SRAM, may pre-charge the bit line pair to reduce the amount of time required to generate and propagate this differential signal. This pre-charging occurs with each clock cycle.
At some combination of SRAM size and frequency of operation, the length of the bit line pair becomes a problem, as recognized by the present inventor. In particular, the propagation delay of the voltage differential through the bit line pair becomes large enough so as to prevent reliable detection of the contents of the addressed memory cell in the available time.
Conventionally, combining many separate SRAM units or “banks” into one large SRAM has been used to provide SRAM memories with greater storage capacity. Referring to FIG. 2 and in a large, multiple bank SRAM design 30, the output from each bank 32-46 is coupled to a MUX 48 having an output 50 which forms the final output of the SRAM 30. In one implementation of the design shown in FIG. 2, full-rail signals with CMOS buffers are driven from the SRAM banks to a static CMOS MUX. A full-rail signal swings across the entire voltage range available to it to generate the requisite logic 0 and logic 1 values. Although a single line (i.e. 52) is shown coupling the banks 32-46 to the MUX 50 in FIG. 2, each of these single lines 52-64 actually comprises N wire tracks, where N is the number of bits in the data word. Thus, if the data word is 16-bits wide, each bank 32-46 would require 16 wire tracks to couple it to the MUX 48. The total number of wire tracks is determined by the number of banks of SRAM memory cells multiplied by the number of data bits. In a large SRAM, comprised of many banks, this arrangement quickly consumes IC real estate available for wire tracks.
Another known implementation of this MUX function for multiple SRAM banks uses a shared pair of pre-charged, low swing wires that are driven with NMOS true/complement devices/drivers and received by a sense amplifier circuit. FIG. 3 illustrates this second implementation 70, wherein a plurality of banks 72-90 each having a driver (not shown) are coupled with a sense amp 92 over a bit line pair 93. The output 94 of the sense amp 92 provides the output of the memory structure 70.
The number of wire tracks that this approach uses is two times the number of data bits (i.e. for each data bit, there is one pair 93 provided). This approach saves power over the implementation of FIG. 2 and reduces the total number of wire tracks. This implementation is limited, however, by the amount of differential signal that can be driven over the length of wire necessary to couple the banks of memory cells to the sense amplifier. At some point, depending on the physical size and frequency of operation of the SRAM 70 shown in FIG. 3, the bit line pairs running from the SRAM banks will be too long to allow the proper voltage differential to propagate reliably in the time available. To maximize the number of banks that can be coupled to a single sense amplifier, the sense amplifier 92 is placed in the center of the length of the bit line pair 93. In normal operation, one half of the clock cycle is used to drive the signal onto the bit line pair and the other half of the cycle is used to pre-charge the pair. One piece of data is transmitted from the driver to the sense amplifier during each clock cycle.
A drawback of the implementation of FIG. 3 is that once the number of banks has increased beyond a certain point, the length of the bit line pair 93 has increased too much to allow the differential signal to propagate through its length in the time available, preventing the sense amplifier 92 from reading the differential signal reliably. In this example, the exact number of banks and the maximum length of the bit line pair 93 that will work reliably with the banks are related to the clock frequency of the SRAM 70.
As the number of SRAM banks is increased, the length of the bit line pair 93 cannot simply be increased to connect to these additional SRAM banks, for the reasons previously discussed. Extra bit line pair sections will be needed and the output of these additional sections combined in an additional, second sense amplifier. This variation is illustrated in FIG. 4, wherein a first plurality of banks are coupled with a first sense amp, a second plurality of banks are coupled with a second sense amp, and the first and second sense amp drive a third sense amp which provides the output of the memory structure.
The implementation of FIG. 4 functions essentially as a multi-level MUX, implemented with shared low swing bit line pairs and sense amplifiers as opposed to the CMOS buffers, wires and CMOS MUX of the implementation of FIG. 2. In the variant 100 shown in FIG. 4, as two low swing bit line pairs 102, 104 are used in series, two clock cycles are needed to transport data from the banks to the output. Further, the necessary number of wire tracks doubles at each level of the SRAM hierarchy, which has obvious scaling difficulties.
As recognized by the present inventor, what is needed is an SRAM architecture that can operate reliably at high clock frequencies and that can be expanded in size without allocating excessive IC real estate for wire tracks.
It is against this background that various embodiments of the present invention were developed.