Computer system performance depends upon processor performance and memory performance. Various ways (e.g., pipelining) are known to improve processor performance. Usually processors are faster than the Dynamic Random Access Memory (DRAM) they are using. So, high performance DRAM is always at a premium. Consequently, a primary concern of memory chip designers is performance. High performance memory designers are always seeking new approaches to reduce memory access time.
One approach, known as caching, is to place fast Static Random Access Memory (SRAM) between the processor and the DRAM. Blocks of data are transferred from the DRAM to the faster SRAM cache. This SRAM cache can match or nearly match processor speed, at a price of complicating the system and increasing system cost.
Still other approaches include the Burst Enhanced Data Out (Burst EDO) RAM and Synchronous DRAM (SDRAM). These approaches essentially merge a small cache onto the DRAM.
While these approaches nearly match RAM performance to processor performance for sequential data transfers, out-of-order data transfers are not matched. Out-of-order data transfers are slower because they may initiate an access to data in a block of memory other than the current block in the cache. In such circumstances, there is a long delay between the processor requesting data and the DRAM providing the requested data to the processor. This delay is known as latency. Provided the processor restricts its memory accesses to sequential addresses, system performance is not impaired. However, this is not practical. So, as the percentage of out- of-order memory operations (such as branches) increase, system performance decreases. Therefore, to some extent system performance is gated by the memory's latency time. Thus, reducing DRAM latency, the time between initiating an access in a new block and receiving the first data bit from the block, is important for improving system performance and is, therefore, an important objective in DRAM design.
FIG. 1 is a schematic representation of a prior art wide Input/Output (I/O) 16Mb DRAM chip. The chip 100 is organized with two Redundant Bit Lines (RBL) 102 and 104 providing two spare columns in each subarray 106. Each subarray 106 includes 2.sup.n Bit Line (BL) pairs 108 (where n is typically between 5 and 8) and one or more redundant bit line pairs (2 in this example). As used hereinafter, reference to a bit line refers to a complementary pair of lines. Each of the subarrays 106 is part of a subarray block 110. All of the subarray blocks 110, collectively, form the entire RAM array. So, for example, a 16Mb RAM has 16 blocks 110 of 1Mb each. Block size, subarray size and the number of subarrays 106 per block 110 are interdependent and, selected based on performance and design objectives.
Multiple bits of a subarray block 110 are accessed (read from or written to) when one word line 112 is selected and driven high. Data from accessed cells are provided simultaneously to the bit lines 108 and redundant bit lines 102 and 104. After a predetermined minimum delay, a single bit line 108 is selected in each subarray 106. The selected bit line 108 is coupled to a Local Data Line (LDL) 114. LDLs 114 are coupled to Master Data Lines (MDLs) 116. The MDLs 116 couple corresponding subarrays 106 in each subarray block 110. Data is transferred between the subarrays 106 and the chip I/O's on the MDLs 116.
FIG. 2A is a transistor level cross-sectional schematic of a bit line 108 in a subarray 106. Cells 120, 122 connected to adjacent word lines 112,118 also are connected to opposite lines 124, 126 of each bit line pair. Thus, half of the word lines 112 (e.g., word lines with even addresses) select cells 120 on one line 124 of the bit line pair. While the remaining half of the word lines 118, (odd addressed word lines) select the cells 122 on the other lines 126 of the bit line pair. Each cell's storage capacitor (C.sub.s) 128 is, typically, a trench capacitor or a stacked structure for array density. Each bit line 124,126 has essentially the same capacitance (C.sub.BL). The voltage stored on C.sub.s is referred to herein as V.sub.s and the voltage on the C.sub.BL is referred to as V.sub.BL.
The circuit of FIG. 2A operates according to the timing diagram of FIG. 2B. A "one" is stored in any cell 120, 122 by charging the cell's storage capacitor 128, 138 to V.sub.dd. Prior to selecting a cell 120 or 122, the array is pre-charged to its steady-state standby condition. The voltage on the bit line pair 124, 126 is pulled to V.sub.dd /2 and equalized by equalization transistor 134 because equalization signal (EQ) on its gate 132 is high. The Word Lines (WL) 112, 118 and Column SeLect (CSL) lines 136 are held low during standby. Additionally, each word line may be clamped low (unless driven high) by a simple resetable latch (not shown).
When the chip's Row Address Strobe signal (RAS) is asserted indicating the array is to be accessed. EQ is pulled low, isolating the bit line pair from each other and from the V.sub.dd /2 pre-charge supply, floating each line of the bit line pair at V.sub.dd /2. A selected word line 112 (or 118) is driven high. The cell's access gate 130 is turned on in each cell 120 on the selected word line 112, coupling the accessed cell's storage capacitor 128 to line 124 of the bit line pair. Thus, a data signal V.sub.SIG develops when charge is transferred between the storage capacitor 128 and line 124. It can be shown that V.sub.SIG =.+-.V.sub.dd /2*C.sub.s /(C.sub.s +C.sub.BL). The other line 126 of the bit line pair 124,126 remains at its pre-charge voltage level V.sub.dd /2 and serves as a reference voltage for the sense amplifier 140.
Typically, bit line capacitance C.sub.BL is at least one order of magnitude larger than the storage capacitor 128. So, even though V.sub.s is V.sub.dd or OV, V.sub.SIG is normally at least an order of magnitude smaller than V.sub.dd.
After a built-in timing delay sufficient to allow V.sub.SIG to develop, i.e. to transfer V.sub.s to the bit line, the Sense Amp Enable (SAE) line 142 goes high and, subsequently, pulls its inverse (SAE) 144 low to set the sense amp 140. The sense amp 140 amplifies V.sub.SIG and re-driven on the bit line pair 124, 126 forcing them High/Low or Low/High depending on data stored in the cell 120. Simultaneously with re-driving the bit line pair, the sense amp writes the sensed data back into the selected cell 120. Once the sense is complete, a Column SeLect signal (CSL) rises to activate the column decoder for column i. So, driving CSL 146 high selects column i in each accessed subarray 106 by connecting the selected column i bit line pair 124, 126 to the LDLs 148, 150 through pass gates 152, 154.
Any time removed from this data path improves RAM latency which shortens block access time.