Presently, most semiconductor memory subsystems employed for data storage in computers are constructed from static random access memory devices (SRAMs) and dynamic random access memory devices (DRAMs). Each type of memory device has advantages and disadvantages, and as a result, DRAMs and SRAMs are typically employed in different applications. Specifically, SRAMs are faster and hence are normally used when fast access time and high bandwidth are critical, such as in cache memories. SRAMs however consume more power, are more expensive to fabricate, and provide fewer cells (bits) per given chip space. On the other hand, while slower than SRAMs, DRAMs are less expensive, consume substantially less power, and provide more bits in the same chip space (i.e. have a higher cell density). DRAMs are normally used to construct those memory subsystems, such as system memories and display frame buffers, where power conservation and high cell density are more critical than speed. In most computing systems, it is these subsystems which dominate the system architecture, and thus, DRAMs are the prevalent type of memory device on the market.
The speed restrictions on DRAMs are a direct consequence of the established manner in which conventional DRAMs are constructed and operated. In particular, the vast majority of DRAMs require two periods per row access (precharge and active), as timed by a row address strobe (/RAS) and a column address strobe (/CAS). These two periods together constitute one cycle. When/RAS is in a logic high state, the DRAM device is in a precharge cycle, during which the nodes of various dynamic circuits, such as those used in the column and row decoders, are pulled to a predetermined voltage. Most importantly, during the precharge cycle the bitlines of the cell array are voltage equalized, as will be discussed further below. Then, when/RAS transitions to a logic low, the device enters the active cycle. Typically, the row address bits are presented to the address pins and latched into the DRAM device with the falling edge of /RAS. After a very small delay for set up, the column address bits are presented at the address pins and latched-in with/CAS. A short time thereafter the addressed cells (location) can be accessed. During page mode, additional column addresses are input with additional falling edges of/CAS (/CAS cycling) to access a series of "pages" along the selected row. At the end of the active cycle, /RAS returns to a logic high state and the device re-enters precharge. In any event, when a change in row is required, a complete new/RAS cycle, including a new precharge cycle and a new active cycle is required.
To improve device speed, it would be desirable to shorten the length of each precharge cycle. Currently, the typical precharge cycle is between 50-60 nsec in length (the typical active cycle is also approximately 50-60 nsec long). While the nodes of most of the dynamic circuitry, such as that used in the row and column decoders, can be charged or discharged within 10 nsecs, the full 50 to 60 nsecs is required to precharge and equalize the bitlines of the cell array. Improving speed of bitline precharge however is constrained by the physics of the CMOS circuitry used in the fabrication of the vast majority of DRAM devices and cell density (4 Mbit to 1 Gbit).
The typical DRAM cell array is arranged in rows and columns of cells, with each row controlled by a conductive wordline and each column of cells associated with a conductive bitline formed by a ("true") half-bitline and a "complementary" half-bitline. A sense amplifier is coupled between each half-bitline /complementary half-bitline pair. During a voltage-high precharge, all of the half-bitlines in the array are precharged to a predetermined voltage, for example 3.3 volts for a 3.3 V Vcc device, and then allowed to float (in some devices, precharge is to substantially zero volts but for purpose of the present discussion, precharge towards Vcc is assumed). During the active cycle, the wordline selected in response to the received row address is selected and all the cells along the corresponding row are turned on. During a read or refresh, the sense amplifiers detect the voltage swing between each precharged half-bitline pair and latch one half-bitline of the pair to a full logic high and the other to a full logic low, depending on the direction of the swing. During a write, the sense amplifiers pull down one half-bitline in each pair, depending on which of the true and complementary half-bitlines is to carry the logic zero and latch-high the other half-bitline. A write of a logic 1 is similar.
The voltage swings on the bitlines are extremely small, and therefore, to avoid mis-matching while sensing from a selected bit, the precharge voltage on each bitline pair must be equalized during precharge as closely as possible. Notwithstanding, some voltage imbalance will always exist, often on the order of 2 to 3 millivolts. Among other things, constraints on the chip fabrication processes result in variations in the lengths and widths of transistors, as well as the capacitance and resistance of the bitlines in the array. The voltage imbalance problem is compounded by inherent variations in the threshold voltages of the transistors used in the sense amplifiers. Therefore, in order to minimize the effect on the imbalances on sense amplifier operation, the bitlines are fully charged during equalization, normally as close to 3.3 volts as possible in a 3.3 volt device.
The full charging of the bitlines dictates the 50 to 60 nsec length of the precharge cycle. Assume for discussion that the capacitance of each bitline (half-bitline pair)is approximately 1 pf. For an array of 4,096 columns, the array capacitance is approximately 4096 pf. Since i=C(dV/dt), to charge the array to approximately 3.3 V in 50 nsecs requires a current of approximately 270 mA. This current can reasonably be provided on chip. However, to decrease the precharge time for the array to 25 nsecs or less, the current must at least double to 570 ma, a current which cannot be reasonably be provided on chip. Further, it should be. noted that speed of charging the bitlines is also constrained by the fact that V=V.sub.0 e.sup.-t/Rc, where V.sub.0 is the initial voltage, t is the time to charge and R and C are the bitline resistance and capacitance respectively. In sum, the larger the array, the slower the charging for a given current.
As processors become faster, the need for faster memories becomes more critical. Short access time particularly becomes critical during operations, such as numerical calculations, where the CPU requires numerous random accesses from memory. Further, during operations such as display refresh/update, where substantial amounts of data are streamed from memory, short access times become important to insure that the cumulative time required for the operations are minimized. In each case, minimizing memory access time improves performance since system resources, such as the buses and core logic are freed for use on additional tasks.
Currently, in order to continue to take advantage of the power consumption and bit density advantages of DRAMs, various techniques have been developed at the system level in order to overcome DRAM speed deficiencies. These techniques are neither ideal nor directly address the problem of memory access speed at its root cause at the device level.
Cache memory is often used to improve access to data by the CPU. In this case, when data is required by the CPU from the system memory, entire blocks of data of a given spacial and/or temporal locality are retrieved from system memory and stored in a fast SRAM cache memory. Accesses by the CPU from cache can then be made with a shorter access time. Still, between 5 to 10 percent of the time, depending on the cache hit rate, the CPU still has to directly access the DRAMs of the system memory. In other systems, multiple memory banks and interleaved accesses are used to improve data access times. These systems normally require implementation of a more complicated timing scheme and often require the use of significantly more memory.
In sum, because of the difficulties of improving access time at the device level, most of the significant present efforts to overcome the speed disadvantages of DRAMs have been directed to improvements in memory operation at the system level. These solutions have their own disadvantages and the problem of long random access times in DRAMs has never been directly resolved. Thus, the need has arisen for a DRAM with a fast cycle time and low latency. Such a device would provide the power consumption and bit density advantages of conventional DRAMs yet provide for faster accesses, notwithstanding the use of caching and/or interleaving in the system.