CMOS technology has evolved such that the computer market has rapidly opened to a wide range of consumers. Today multi-media applications require at least an 8 MB and preferably even a 16 Mb memory, which increases the relative cost of the memory system within a computer. In the near future, it is likely that 32 MB and 64 MB computers will become commonplace, which suggests a potential demand for 256 Mb DRAMs (Dynamic Random Access Memory) and beyond. Still in the development stage, DRAMs in the Gigabit range are already under way.
DRAM architectures have evolved over the years driven by system requirements that necessitate larger memory capacity. The speed of a DRAM, characterized by its random access time (tRAC) and by its random access cycle time (tRC), however, has not improved in a like manner. This has created a large speed gap between the DRAMs and the CPU, particularly, since the speed of the clock of the CPU is consistently improving over time. In order to overcome this problem, a cache is now commonly used, not only for high-end workstations, but also for multimedia computers. A cache, however, requires fast and expensive SRAMs, increasing system cost. Furthermore, even with this expensive and complex cache, the system performance, particularly with large density memories, cannot be significantly enhanced in view of the high probability of a cache miss. It is therefore crucial, if the DRAM speed (i.e., tRAC and tRC) is to approach that of an SRAM, that the cache overhead be reduced or, preferably, ultimately eliminated.
The tRAC and the tRC in a DRAM are fundamentally slower than in an SRAM. This is because the amount of data stored in DRAM cells is small when compared to that stored in an SRAM. Therefore, small signals, which are characteristic of DRAMs, need to be amplified, which, in turn, slows down tRAC. Moreover, data that is read in a DRAM is destroyed and, therefore, must be restored in order to initiate the next read or write operation, thereby slowing the tRC.
A multi-bank DRAM containing several independently controllable arrays within a chip, allows starting a next operation using the pipeline approach. This method is ideal for boosting tRC. By way of example, having two banks in a chip allows halving tRC. The tRAC of two consecutive random access operations are transparent to previous operations, since a next operation may start prior to a previous one being completed. It is because of these considerations that the concept of introducing multi-banks in a chip is of such importance for current and future systems. Such an architecture has already been used in several multi-bank DRAMs products, such as SDRAMs, RDRAMs, and MDRAMs. Designing multi-banks in a single chip, however, requires special handling when implementing this concept in a hierarchical Column Select Line (CSL) architecture, which is not found in a single-bank DRAM.
By way of example, and with reference to FIG. 1a, is depicted a Master-DQ (MDQ) architecture of a 256 Mb DRAM, allowing for a wide I/O organization with a small silicon area overhead. This architecture is more fully described in an article entitled: "A 286 mm.sup.2 256 Mb DRAM with x32 both-ends DQ", by Y. Watanabe et al., published in the IEEE Journal for Solid-State Circuits, Vol. 31, No. 4, pp. 567-574. This DRAM is configured as a "single bank" architecture, (a bank being defined as an array which can be independently controlled, and more specifically, wherein the next `random access mode`, in which the next wordline in a different bank needs to be activated, can be initiated until all previous `random access modes` have been completed).
Chip 10 shown in FIG. 1a includes sixteen 16 Mb units 100, each consisting of sixteen 1 Mb blocks 101. Each 1 Mb block 101 contains 512 wordlines (WLs) 103 spanning in a horizontal direction, and 2048 bitline pairs (BLs) 104 in a vertical direction. For simplicity, Row Decoders (RDECs) 108 are located at the left of each 16 Mb unit 100. The Column Decoders (CDECs) 109 and the second sense amplifiers (SSAS) 110 are placed at the bottom of each 16 Mb unit 100. For column direction, the 16 Mb unit 100 consists of sixteen 1 Mb segments 102. The 32 column select lines CSLs 107 and the 4 hierarchical data lines having local-DQ (LDQ) 105 and 4 master-DQ (MDQ) 106 are arranged over each 1 Mb segment 102. The intersection of 1 Mb block 101 and 1 Mb segment 102 contains 64 Kb array 108. In summary, the 1 Mb block 101 contains sixteen 64 Kb arrays 108, while the 1 Mb segment consists of sixteen 64 Kb arrays 108.
For clarity and simplicity sake, the discussion following hereinafter assumes only one of the sixteen 1 Mb block 101 to be activate at any given time, the data of which is transferred with LDQ and MDQ to the corresponding 1 Mb segment 102.
FIG. 1b shows a more detailed schematic of the 1 Mb segment 102 depicted in FIG. 1a, wherein two sixteen 64 Kb arrays 200A and 200B out of 16 are illustrated. 200A and 200B are, respectively, the intersection area between the block 101A and the segment 102, and the intersection area between the block 101B and the segment 102 (FIG. 1a). The 64 Kb array consists of 512 WLs 202 and 128 BL pairs 203. As discussed previously, 32 CSLs 213, 4 LDQ pairs 211, and 4 MDQ pairs 212, are arranged over this 1 Mb segment. (For simplicity sake, FIG. 1b is shown to include only 1 out of 4 of each of the features BLs, LDQs, and MDQs, comprising this arrangement.) When one of the 32 CSLs 213 are activated, 4 of the 128 BL pairs 203 are coupled to the corresponding 4 LDQ pairs 211 and 4 MDQ pairs 212. The detailed operation of a single bank DRAM and problems relating to a multi-bank DRAM are described next.
When in a standby mode (i.e., when no WL 202 and no CSL 213 are active, resulting in no data being written or read in and out of the memory), all the BLs 203 and LDQs 211 are pre-charged to 1/2 the value of the power supply Vdd. MDQs 212 are precharged to the Vdd level. When a 1 Mb block A is selected, BL equalizers 207 and MDQ equalizers 208 are disabled first. MDQ lines 212 are coupled to LDQ 211 through MDQ transistor 206. This allows LDQ 211 to be pre-charged to the value of Vdd. WL 202 then rises to read data from cell 201. Sense amplifiers (SA) 204 are activated only after the signal has been sufficiently developed (typically, 90%) on BLs 203. CSL 213 rises to transfer data from selected BL 203 pair to the respective pairs, LDQ 211 and MDQ 212, for a read mode (or inversely, for a write mode). BLs and LDQs in an unselected 1 Mb block B are kept at 1/2 Vdd's pre-charged level, since the BL and LDQ equalizers are `on`, while the MDQ transistor 206 remains `off`. This organization allows CSLs 213 to be shared between the 1 Mb blocks A and B, requiring only one column decoder for each 16 Mb unit, preferably located at the bottom of each 16 Mb unit.
The MDQ architecture suffers from a fundamental deficiency, in the instance when A and B 1 Mb banks are operated as two separate entities. By way of example, if the 1 Mb block A is in a `signal development mode` (i.e., when data is read out of a memory cell and transferred to the bitline), the 1 Mb block B may be in a `column access mode` (i.e., the time it takes to read or write data to a cell). Because the CSLs are shared between banks A and B, the column switch transistor 205 in array 200A, now in the signal development phase, is also activated, thereby destroying the data in cell 201 in array 200A. Column switch transistor 205 must remain in the off state during the signal development mode in order not to destroy the data. The exact timing when the signal development phase starts and the timing when the column transistor phase opens cannot be internally predicted because they are externally controlled by the system designer and/or by customer constraints. To overcome this problem, three solutions, embodied in more advanced architectures, to be described hereinafter, have been advanced.
In a first solution (not shown in the drawings) described in an article published in the IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1656-1668, entitled: "A 2.5 ns Clock Access, 250 MHz, 256 Mb SDRAM with Synchronous Mirror Delay", by T. Saeki, et al., four banks are configured having four units, each of which is controlled by its own independent column decoders. Since the CSLs are not shared among the various banks, the problem previously described is bypassed. However, the number of banks configured in the chip is limited by the number of units present, which is not adequate for configurations of 16 or more banks, which are required for a 1 Gb DRAM design.
In a second solution, illustrated in FIG. 1c, two column decoders 300A and 300B are shown, respectively controlling banks A and B. More specifically, the CSLs in the corresponding bank are independently controlled by full column decoders 300A and 300B. However, by duplicating the number of full column decoders, a substantial penalty in added chip real estate is imposed on the designer as, for instance, by increasing the height of the 16 Mb unit by .sup..about. 150 .mu.m for the two banks A and B (and 150 .mu.m for each additional bank).
A third solution to the aforementioned problem pertaining the MDQ architecture, and which is commonly referred to as a "merged bank architecture" (MBA), is illustrated in FIG. 1d. Rather than using full column decoders 410 for each of the banks A and B, partial local column decoders 400A and 400B are added instead. Partial local column decoders 400A are driven by global column decoders 410. Since partial column decoders (400A and 400B) are smaller than full column decoders, the area penalty can be substantially reduced. However, this approach requires extra interconnecting wires (i.e., twice as many) for global CSL 401 and local CSL 402, which are difficult to accommodate within the limited space available. Details of this architecture may be found in an article published in the IEEE Journal of Solid-State Circuits, Vol. 31, No. 11, pp. 1635-1644, entitled: "A 32-Bank 1 Gb Self-Strobing Synchronous DRAM with 1 GByte/s Bandwidth", by Jei-Hwan Yoo, et al.