1. Field of Invention
The present invention relates in general to the digital data processing field. More particularly, the present invention relates to semiconductor memories within digital data processing systems.
2. Background Art
In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises at least one central processing unit (CPU) and supporting hardware, such as communications buses and memory, necessary to store, retrieve and transfer information. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which comprise a computer program and direct the operation of the other system components.
The overall speed of a computer system is typically improved by increasing parallelism, and specifically, by employing multiple CPUs (also referred to as processors). The modest cost of individual processors packaged on integrated circuit chips has made multiprocessor systems practical, although such multiple processors add more layers of complexity to a system.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, using software having enhanced function, along with faster hardware.
Among such faster hardware is static random access memory (SRAM) which is typically faster than dynamic random access memory (DRAM). Accordingly, SRAM is frequently used where speed is a primary consideration such as in CPU caches and external caches. One type of SRAM known in the art is high performance domino SRAM. For example, U.S. Pat. No. 5,668,761, entitled “FAST READ DOMINO SRAM”, issued on Sep. 16, 1997 to Muhich et al., and assigned to IBM Corporation, discloses a high performance domino SRAM and is hereby incorporated herein by reference in its entirety.
A domino SRAM combines an SRAM with a dynamic circuit known as a “domino circuit”. To clarify that dynamic circuits are different than dynamic type memories, such as DRAMs, dynamic circuits are referred to herein as domino circuits or logic. In general, domino logic is a circuit design technique that makes use of dynamic circuits, and has the advantage of low propagation delay (i.e., these are fast circuits) and smaller area (i.e., due to fewer transistors). In domino logic, dynamic nodes are precharged during a portion of a clock cycle and conditionally discharged during another portion of the clock cycle, where the discharging performs the logic function.
FIG. 1 illustrates a conventional memory system. The memory system comprises a wordline decoder, a plurality of semiconductor memory cells, a bitline decoder, and an input/output circuit. In general, a memory system typically includes a memory cell array that has a grid of bitlines and wordlines, with semiconductor memory cells disposed at intersections of the bitlines and wordlines. During operation, the bitlines and wordlines are selectively asserted or negated to enable at least one of the memory cells to be read or written. The wordline decoder is coupled to the memory cells to provide a plurality of decoded data. Additionally, the bitline decoder is coupled to the memory cells to communicate data which has been decoded or will be decoded. The input/output circuit is coupled to the bitline decoder to communicate data with the bitline decoder and to determine a value which corresponds to that data.
FIGS. 2A, 2B and 2C illustrate a conventional high performance, low power domino SRAM design including multiple local cell groups. As shown in FIG. 2A, each cell group includes multiple SRAM cells 1-N and local true and complement bitlines LBLT and LBLC. Each SRAM cell includes a pair of inverters that operate together in a loop to store true and complement (T and C) data. The local true bitline LBLT and the local complement bitline LBLC are connected to each SRAM cell by a pair of wordline N-channel field effect transistors (NFETs) to respective true and complement sides of the inverters. A WORDLINE provides the gate input to the wordline NFETs. A particular WORDLINE is activated, turning on respective wordline NFETs to perform a read or write operation.
As shown in FIG. 2B, the prior art domino SRAM includes multiple local cell groups 1-M. Associated with each local cell group are precharge true and complement circuits coupled to the respective local true and complement bitlines LBLT and LBLC, write true and write complement circuits, and a local evaluate circuit. Each of the local evaluate circuits is coupled to a global bitline labeled 2ND STAGE EVAL and a second stage inverter that provides output data or is coupled to more stages. A write predriver circuit receiving input data and a write enable signal provides write true WRITE T and write complement WRITE C signals to the write true and write complement circuits of each local cell group.
A read occurs when a wordline is activated. Since true and complement (T and C) data is stored in the SRAM memory cell, either the precharged high true local bitline LBLT will be discharged if a zero was stored on the true side or the precharged high complement local bitline LBLC will be discharged if a zero was stored on the complement side. The local bitline, LBLT or LBLC connected to the one side will remain in its high precharged state. If the true local bitline LBLT was discharged then the zero will propagate through one or more series of domino stages eventually to the output of the SRAM array. If the true local bitline LBLT was not discharged then no switching through the domino stages will occur and the precharged value will remain at the SRAM output.
To perform a write operation, the wordline is activated as in a read. Then either the write true WRITE T or write complement WRITE C signal is activated which pulls either the true or complement local bitline low via the respective write true circuit or write complement circuit while the other local bitline remains at its precharged level, thus updating the SRAM cell.
As shown in FIG. 2C, a wordline decoder includes circuitry that outputs an intermediate output signal OUT to other decode circuitry (not shown) that activates the appropriate precharge and wordline signals. As mentioned earlier, the wordline signal allows access to the memory cells for reads and writes. A read wordline signal READ_WL and a write wordline signal WRITE_WL are generated as outputs of a flip-flop with a data input signal READ_WRITEBAR. The data input signal READ_WRITEBAR indicates whether a read operation or a write operation will be performed in the next cycle of a clock input signal CLOCK. The read wordline signal READ_WL and at least two address bit signals A0 and A1 are AND'd together in a decode block. In addition, the write wordline signal WRITE_WL and the at least two address bit signals A0 and A1 are AND'd together in the decode block. These two AND outputs are OR'd in the decode block to produce the intermediate output signal OUT, which proceeds through the other decode circuitry which ultimately triggers the rising edge of the precharge and the wordline signals.
FIG. 3 is a timing diagram showing the operation of the prior art domino SRAM shown in FIGS. 2A, 2B and 2C. Domino SRAM arrays, like domino logic, are governed by the behavior of the precharge cycle. Reads and writes to the SRAM cells occur during the evaluation phase when the precharge signal is high. Consequently, the wordline signal WL, which is the output of the wordline decoder and which allows access to the memory cells for reads and writes, follows the precharge signal closely. An efficient design will employ as much of the same decode/precharge/wordline circuitry as possible for both read and write operations, but a problem arises when the timing demands of a read operation and a write operation conflict. For example, a fast read path requires early rising precharge signal and wordline signal WL which can cause difficulties during a write operation. That is, if the wordline signal WL is high a significant amount of time before arrival of the write data, it is as if a read operation had commenced and a bitline signal BL (denoted with reference numeral “305” in FIG. 3) may start to fall contrary to what is required by the write data. Once this fall occurs, the bitline signal BL is slow to rise. In order for write performance to be efficient, this bitline signal BL must exhibit a profile that does not prematurely fall. Hence, the “early read” problem degrades the write performance of the domino SRAM.
Therefore, a need exists for an enhanced mechanism for handling unbalanced read/write paths in domino SRAM arrays.