The present invention relates generally to high-speed DRAM architectures, and specifically to timing of read, write and refresh operations.
Traditionally, the design of commodity of Dynamic Random Access Memory (DRAM) devices is more focused on achieving low cost-per-bit through high aggregate bit density than on achieving higher memory performance. The reason for this is the cell capacity of a two dimensional memory array increases quadratically with scaling, while the overhead area of bit line sense amplifiers, word line drivers, and row address (or x-address) and column address (or y-address) decoders increase linearly with scaling. Therefore, the design emphasis focus on memory density has resulted in commodity DRAMs being designed having sub-arrays as large as practically possible, despite its strongly deleterious effect on the time needed to perform cell readout, bit line sensing, cell restoration and bit line equalization and precharge. As a result, the relatively low performance of traditional DRAM architectures as compared to Static Random Access Memory (SRAM) has generally limited its use to large capacity, high density, cost sensitive applications where performance is secondary.
Furthermore, traditional DRAM architectures minimize the number signal pins on memory devices by multiplexing address lines between the row and column components of the address. As a result, the two dimensional nature of DRAM array organization has always been an inherent part of the interface between memory control or logic and DRAM memory devices.
The advent of synchronous interface DRAM technologies such as SDRAM, direct RAMBUS, and double data rate (DDR) SDRAM has replaced the separate row and column control signals of asynchronous interface DRAM technologies, such as fast page mode (FPM) and extended data output (EDO), with encoded commands. However, the traditional two-dimensional logical addressing organization of previous architectures has been retained.
An early attempt at increasing DRAM performance by minimizing the latency and cycle time impact of slow row access operations due to the use of large cell arrays led to the creation of two different classes of memory operations, both of which are well-known in the industry. A first class comprises bank accesses. A bank access consists of a row open command followed by a column access. Referring to FIG. 1a, a timing diagram for a bank access is illustrated. A second class comprises page accesses. A page access consists of a column access to a row left open by a previous row open or bank access command. As a result, page accesses are typically faster than bank accesses. Referring to FIG. 1b, a timing diagram for a page access is illustrated. The efficacy of page accesses in reducing average latency is due to the statistical spatial locality in the memory access patterns of many computing and communication applications. That is, there is a strong probability that consecutive memory accesses will target the same row.
A further refinement of such a dual memory access class scheme is the creation of DRAM architectures that explicitly divide each memory device into two or more equal size regions referred to as banks. The intention of this architectural enhancement is to partially reduce the overhead of row accesses by allowing the overlap of memory accesses to one bank, while the other bank is engaged in a row open or close operation. A system implementing a multi-bank architecture is well-known in the industry and is illustrated generally in FIG. 2a by the numeral 200. A timing diagram for such a system is illustrated in FIG. 2b. 
A fundamental problem with all of these schemes is the retention of the system of two classes of memory accesses to partially compensate for the slow row access associated with large DRAM arrays. Many real time applications, such as digital signal processors, are limited by worst-case memory performance. These systems cannot tolerate differences in memory access timing as a function of the particular address patterns of consecutive accesses. Even performance optimized embedded DRAM macro block designs strongly tend to retain the dual access class paradigm of commodity DRAM architectures.
Referring to FIG. 3a, an additional attempt at increasing the performance of DRAM with the use of a dual-port architecture is illustrated generally by numeral 300. The dual ported architecture is a more recent advancement in DRAM architecture for achieving higher performance. Each memory cell MC is connected to two bit lines, BL1 and BL2, through access transistors N1 and N2 respectively. This cell architecture allows simultaneous access of memory cell MC through one access transistor and its associated bit line, for example N1 and BL1, while BL2, associated with the other access transistor N2, undergoes precharge and equalization. As a result, a second access can occur via N2 without any delay to precharge bit line BL2.
By alternating back and forth between the two access transistors and their respective bit lines, this architecture can completely hide the overhead associated with closing rows and precharging and equalizing the bit lines. However, the main drawback of this scheme is the greatly reduced bit density within the DRAM array due to the doubling of the number of access transistors and bit lines per memory cell as compared to conventional DRAM designs. Furthermore, such a system also uses an open bit line architecture which is undesirable due to its susceptibility to unmatched noise coupling to bit line pairs.
It is an object of the present invention to obviate and mitigate the above mentioned disadvantages.
In accordance with an aspect of the present invention, there is provided a Dynamic Random Access Memory (DRAM) for performing read, write, and refresh operations. The DRAM includes a plurality of sub-arrays, each having a plurality of memory cells, each of which is coupled with a complementary bit line pair and a word line.
The DRAM further includes a word line enable device for asserting a selected one of the word lines and a column select device for asserting a selected one of the bit line pairs. A timing circuit is provided for controlling the word line enable device, the column select device, and the read, write, and refresh operations in response to a word line timing pulse. The read, write, and refresh operation are performed in the same amount of time