Register files are performance-critical memory components that typically can be found in general purpose microprocessors and other types of digital data processors. A register file is typically required to meet the following constraints: 1) exhibit a single clock cycle read/write latency that can support back-to-back read and write operations; and 2) provide multiple read/write port capability to enable the simultaneous access by several execution units in a super-scalar architecture. These requirements, coupled with the demand for a large number of word entries per port, have traditionally necessitated the use of wire—OR type dynamic circuits for the local and global bitlines (i.e., for those circuit paths that convey the input and output data bits).
In accordance with CMOS technology scaling, and in order to achieve high performance, the supply voltage Vdd and threshold voltage Vt are both scaled to maintain approximately the same Vdd/Vt ratio. However, aggressive Vt scaling results in an exponential increase in bitline active leakage currents, and also results in a poor bitline noise immunity scaling trend. Therefore, alternate bitline circuit techniques that curtail the poor bitline noise immunity scaling trend are required in order to achieve high noise immunity while sustaining high performance.
Previous techniques have involved the use of negative wordline drivers, dynamic threshold voltage adjustment via substrate/well bias control, and pseudo-static bitlines.
FIG. 1 shows the organization of a conventional 4-read, 2-write ported 256-word×40-bit/word register file 1. The register file 1 contains four read address decoder circuit sections 2, two write address decoder circuit sections 3, and a 40-bit register file array 4 arranged as a 40 slice bitline stack. A complete read operation is performed in two clock cycles. An 8-bit read/write address per port is decoded in section 2 in the first cycle to deliver the read/write select signals into the register file array 4. The decoder 2 is non-critical, and therefore can be implemented in conventional static CMOS circuitry. In the next cycle, which is critical in terms of performance, the actual bitline read operation is conducted. FIG. 2 shows one bit slice for one read-port path, while FIG. 3 shows the four full-swing local bitlines. The four local bitlines are totally independent of each other, sharing only the bitcells. Each local bitline (LBL) supports 16 bitcells and a two-way merge via a static NMOS gate that drives a global bitline (GBL). A bitcell has two-write-ports and four-read-ports. Both reading and writing are single ended.
With regard now to the use of dual-Vt dynamic bitlines, the LBL and GBL dynamic ORs are susceptible to noise due to high active leakage during evaluation when the precharged domino node should stay high. LBL is particularly more sensitive than GBL due to a small domino node stored charge and a wider dynamic OR structure.
FIG. 4 shows a worst-case bitline noise scenario in which all low-Vt transistors (LVTs) are used to maximize the performance of the read operation.
A dual-VtLBL uses a high-Vt(HVT) on the read-selection transistor and a low-Vt(LVT) on the bitcell data transistors, as shown in FIG. 5. The use of the high-Vt transistors limits the bitline leakage. However, this benefit is achieved at the cost of degraded performance due to the reduced drive currents to the high-Vt transistors.
FIG. 6 shows a prior art pseudo-static leakage-tolerant LBL technique. This technique employs modifications to the conventional dynamic bitline topology. A first modification is that the read-select input and bitcell data locations on the bitline stack are swapped, and the read-select signals feed the lower (M2) transistors of the LBL. A second modification is the introduction of static-precharge transistors (Px) that are driven actively by the read-select signals. These Px transistors anchor the bitline static nodes (VS) at Vdd when the read-selects are at ground potential. A third modification is the introduction of static 2-input NOR gates, whose inputs are the bitline stack node and bitcell data. The NOR gate outputs drive the upper (M1) transistors of the LBL.
When the read-select inputs are at GND, the NOR gate outputs force the leakage-limiting M1 transistor input to GND. This effectively cuts off the sub-threshold active leakage current path of the bitline, since both the drain and the source of the M1 transistor is maximized due to the full Vdd of the source-body bias, which further elevates the Vt. As a result, the bitline noise immunity can be increased.
However, the benefit of the pseudo-static technique shown in FIG. 6 is obtained at the cost of degraded performance due to the presence of the additional NOR-gate, and the sub-threshold leakage through Px and M2.