Embodiments of the present invention relate to memory circuits, and more particularly, to caches.
A cache is high-speed memory. To achieve high-speed performance, caches often employ dynamic (domino) logic. A high-level abstraction of a dynamic cache is provided in FIG. 1. The cache in FIG. 1 is addressable memory, where an address is provided on ports 102 to access one or more bits of information associated with the address. The cache shown in FIG. 1 may be part of a larger memory system, such as for example a content addressable memory system, where the address on ports 102 is obtained after tag matching.
The address on ports 102 is decoded by decoder 104. In the particular example of FIG. 1, the address on ports 102 is 8 bits wide, so that decoder 104 is an 8-to-256 bit decoder. There are 256 ports, labeled ports 106. One of ports 106 is asserted HIGH, and the other remaining ports 106 are LOW, corresponding to the decoded 8 bit address at ports 102. The signals on ports 106 are static in the sense that any port belonging to the set of ports 106 is held at a constant logical value (either HIGH or LOW) while an address is provided on port 102.
Domino gate 108 provides dynamic (domino) compatible signals at its output ports 110, indicative of the static signals on ports 106. The signals on ports 110 are read-select signals, where a HIGH logical value indicates a read operation. Domino gate 108 is clocked by a clock signal, denoted by xcfx86, where xcfx86 is HIGH during an evaluation phase and is LOW during a pre-charge phase. Domino gate 108 may comprise simple dynamic buffers, such as two dynamic inverters in series for each input/output port pair, so that the output signals on output ports 110 are LOW during a pre-charge phase, and take on the same logical values as the corresponding input signals on input ports 106 during an evaluation phase.
Set of memory cells 112 represents a set of memory cells, each memory cell sharing local bit line 114. In the particular example of FIG. 1, set of memory cells 112 comprises 16 memory cells. Local bit line 114 is pulled HIGH by pullup pMOSFET 116 during a pre-charge phase, and a half-keeper comprising pMOSFET 118 and inverter 120 keeps bit line 114 HIGH during an evaluation phase unless it is otherwise pulled LOW by one of the memory cells in set of memory cells 112. For simplicity, only one set of memory cells with the corresponding local bit line are shown. For example, for the dynamic cache shown in FIG. 1, there will be 16 such sets of memory cells and local bit lines, each set of memory cells comprising 16 memory cells, for a total plurality of 256 memory cells.
Note that the roles of HIGH and LOW may be interchanged in the previous description regarding decoder 104. That is, one of ports 106 may be asserted LOW, where the other remaining ports are HIGH. In that case, domino gate 108 need only comprise one inverter for each input/output port pair, so that a read-select signal on one of ports 110 is HIGH for a read operation.
An example of a set of memory cells sharing the same local bit line is provided in FIG. 2. For simplicity, FIG. 2 shows only that portion of a set of memory cells relevant to the present description. The gates of read-access transistors 202 are connected to the appropriate read-select ports 110 so as to receive the appropriate read-select signals. A typical memory cell 204 comprises cross-coupled inverters and a read-pass nMOSFET 210. Not shown are the ports required for writing data into a memory cell. Note that in FIG. 1, clock signal xcfx86 is buffered by inverters 122 before being applied to the gate of pullup pMOSFET 116 to account for the delay due to domino gate 108. For simplicity, such inverters are omitted in FIG. 2, it being understood that a delay functional unit of some type may be needed in an actual circuit realization. Static unit 206 represents generic static logic, which may be inserted between domino circuit blocks of a larger circuit system, so that port 208 may be connected to other domino circuit blocks.
As device technology scales to smaller dimensions, sub-threshold leakage current may contribute to significant unwanted power dissipation in cache circuits, and may contribute to inaccurate readings of memory cells. For example, consider the case in which all memory cells in FIG. 2 are in a logical state such that the gates of read-pass transistors 210 are HIGH. During a pre-charge phase, bit line 114 will be charged HIGH and all read-access transistors 202 will be OFF. Nevertheless, the additive effect of sub-threshold leakage current in all read-access transistors 202 may cause significant current flow from bit line 114 at the HIGH potential to ground at the LOW potential, thereby wasting power.
Furthermore, consider another case in which all memory cells in FIG. 2 are in a logical state such that the gates of all read-pass transistors 210 are LOW. If during an evaluation phase a read operation is performed on one of the memory cells, the sub-threshold leakage current in the read-pass transistor in the memory cell being read may cause bit line 114 to discharge to a sufficiently low potential such that an incorrect read operation occurs.