Whether by frequency scaling or parallelism, as microprocessor performance increases, the demand for high-speed memory grows dramatically. Increasing cache capacity requirements, which are critical to sustain system performance trends, present a trade-off between cache performance and capacity. Since bit line sensing during a read operation comprises a large portion of the cache access time, it is important to develop high-speed sensing strategies.
For high-speed static random access memory (SRAM) and register file array applications, recent trends have shifted away from differential sense amplifier techniques due to the difficulties in minimizing offset voltage under the influence of growing device variability. Instead, hierarchical sensing techniques with full bit line swings are employed using Domino-like circuits, where a pre-charged bit line is tied directly to the input of a static complementary metal oxide semiconductor (CMOS) gate (usually a NAND gate). For example, a register file generally uses 2-stage sensing: a primary domino stage for the Local Bit-Line (LBL) and a secondary stage for the Global Bit-Line (GBL).
As shown in FIG. 1A, a static NAND (A0) 102 is used to sense two LBLs (LBL<0> and LBL<1>), each of which has m sets of memory cell read ports (N0 and N1 or N2 and N3). Both LBLs are pre-charged to VDD (chipset voltage) when LBLP is low, which turns on P0 and P1. One of the two LBLs goes low when the corresponding RWL (redundant word line) is activated and the cell node is high as shown in FIGS. 2a and 2b. The GBL is pre-charged to VDD when GBLP is low, which turns on P20.
A high-to-low transition on LBL changes the output of the static NAND (GBL_PD), which turns on pull-down MOS N20 (one of n sets) to pull down GBL. This can trigger the secondary sense amp, which itself can be another static NAND gate. The value “m×n” is determined by the number of entries in the given memory array, but the specific values of m and n are chosen by balancing performance and array efficiency. The static NAND circuit used in this sensing scheme has excellent immunity to noise on the LBL node (due to the inherent noise rejection capabilities of static circuits), but the performance is poor compared to high-speed dynamic circuits.
To improve the speed of this type of approach, a dynamic NAND (FIG. 1B) can be used instead of a static CMOS. LBL can be thus be sensed faster than in the static NAND case because N12 (102) is turned off (by DYNP) before either P10 or P11 turns on (by LBL evaluation), which eliminates fighting between these transistors. As a result, a drastic improvement in sensing performance can be attained. In addition, a reduced transistor count can also significantly reduce the layout area required.
However, with this dynamic NAND configuration, noise immunity is compromised because noise on the floating LBL node could accidentally discharge the GBL_PD node, which is also floating, thereby resulting in a read error. While the addition of an NFET (negative channel field effect transistor) keeper to the GBL_PD node could address this problem, it would eliminate the speed advantage of the dynamic gate.