1. Field of the Invention
The present invention relates generally to semiconductor circuits implemented in computer memory, and more specifically to circuit design for bitline circuitry of large cache memory blocks.
2. Description of the Related Art
In large cache memory blocks, generally defined as memory of 64 kilobytes or larger, a plurality of memory cells are arrayed and connected by bitlines and wordlines. FIG. 1A shows a representative layout of a cache memory block 10. A plurality of memory cells 12 are defined in an array or grid, and individual memory cells 12 are connected along columns by a pair of bitlines known as a bitline (BL) 14a and inverse bitline (/BL) 14b. Individual memory cells 12 are connected along rows by wordlines (WL) 15.
As larger and larger cache memory blocks are implemented, e.g., large cache memory blocks, the number of memory cells 12 that can be supported by a WL 15 and by BL 14a and /BL 14b is limited by such factors as power consumption, performance, and the like. By way of example, when memory cells 12 are switching, circuits are charged and discharged along a common BL 14a and /BL 14b, requiring increased power with increasing numbers of memory cells 12, and decreasing the switching speed. In order to support the increased number of memory cells 12 of large cache memory, a common design is to sub-divide the memory cells 12 and utilize local circuitry for the sub-array memory cells that will tie in to global circuitry to support the entire large cache memory block.
FIG. 1B shows a partial view of a sub-array or partitioning of a memory cell 12. Sub-blocks 16 include a plurality of sub-cells 18 joined by local bitline pairs shown as local bit line (LBL) 20a and local inverse bitline (/LBL) 20b. A local sense amp, also known as a first stage sense amp, is located in block 22 which receives and transmits the signals received from the local bitline pairs, e.g., LBL 20a and /LBL 20b, to global bitline pairs shown as global bitline (GBL) 24a and global inverse bitline (/GBL) 24b. GBL 24a and /GBL 24b transmit the received signals from the plurality of sub-blocks 16 through a second stage sense amp 26 to an input/output (I/O) shown in block 30.
In the conventional design as illustrated in FIG. 1B, block 22 containing a local sense amp to capture the signals from LBL 20a and /LBL 20b and to transmit the signals through GBL 24a and /GBL 24b to I/O 30 may also include buffers and drivers to move the full swing to I/O 30. Drivers typically perform a full swing from 0V-1.1V (assuming supply voltage=1.1V), and the reverse, consuming a great deal power, and the longer the line, GBL 24a and /GBL 24b, the greater capacitance exists to charge and discharge. Further, large drivers require precious circuit space. Finally, switching in an increasing plurality of lines generates a lot of noise. By way of example, 2500 lines may be switching in a large cache memory block with associated voltage swings in local sense amps and drivers, transmitted through a plurality of global bitline pairs, consuming a great deal of current from the power supply and impacting performance of other parts of the CPU.
One attempt in prior art to reduce power consumption, area requirements, and noise has been to use a local sense amp to drive the PMOS, hereinafter referred to as p-type, devices to pull down precharged high global bit lines, generating limited voltage swing, and use a second stage amp 26 to generate full swing signals at I/O 30. FIG. 1C shows a detail view of a partial sub-block 16 with associated local sense amp 32 and p-type devices 34a, 34b in a typical implementation. As illustrated, a plurality of sub-cells 18 are joined along local bitline pairs LBL 20a and /LBL 20b, and a local sense amp 32 feeds through a pair of p-type devices 34a and 34b respectively to GBL 24a and /GBL 24b. In such a configuration, power consumption is reduced due to limited swing signals, required area is minimized, and noise is decreased with small drivers. However, while a p-type device is good at pull-up, it is not a good pull-down device. As is known, a property of the p-type device is that it will not pull down the voltage to zero, but is instead limited to the p-threshold of approximately 0.35V-0.4V, depending on the technology used. Further, and perhaps more importantly, the speed of the pull-down is inadequate for the desired performance characteristics of large cache memory.
In light of the foregoing, it is desired to implement a circuit design that will limit the voltage swing at the local sense amp and increase switching speed while maintaining a minimum of noise and area requirements.