Synchronous dynamic random access memories (SDRAMs) are well known. In an SDRAM, and referring to FIG. 1A, data is written into and read out from the SDRAM 10 in synchronization with an external system clock signal (CLK). More specifically, when writing data to the array of cells 12 in the SDRAM 10, data is presented at the data lines (DQx) and at some discrete point during the clock's cycle—for example, on the clock's rising edge—this data enters the SDRAM 10 for eventual storage in the array 12 at an address specified by the address lines (Ax). Conversely, when reading data from the array 12, data is presented to the data lines (DQx) in accordance with the queried address on the address lines Ax, again at some discrete portion of the clock's cycle. Whether reading or writing is taking place is determined by the status of the row access strobe (RAS), column access strobe (CAS), and write enable (W/E) signals as is well known in DRAM technology. Ultimately these external signals are received from a system, such as a microprocessor system.
However, in an SDRAM, data can neither be written to nor read from the array 12 instantaneously. For example, it takes time for the data read from the array 12 to reach the data lines DQx. For example, in a typical device, it typically takes approximately 15 nanoseconds (ns) for data to move from the array 12 to the data lines DQx when a read has been initiated. Thus, and referring to FIG. 1B, if the system clock signal CLK has a period (p) of 5 ns, then if a read request is specified at time T0 at a first address (addr 1), the data from that address will appear at the data lines DQx after three clock cycles at T3. Accordingly, the device is said to have a “read latency” (or Column Access Strobe (CAS) latency) of 3 clock cycles. Thereafter, the next entered address (addr n) can be processed and its data presented at the data lines at T4.
A synchronous device which effectively provides for faster data transfer is a double data read (DDR) SDRAM. In a DDR SDRAM (hereinafter DDR), two data pipe lines are present in the device, one of which is active on the rising edge of the clock signal, and one of which is active on the falling edge of the clock signal. This is illustrated in FIG. 2. In a DDR, the address entered at T0, addr 1, is used to “prefetch” the data at that address and at the next sequential address (addr 2). The prefetched data at the two addresses are then output at the data lines DQx on the rising edge of T3 (T3r) and the falling edge of T3 (T3f). Thereafter, the next address (addr n) can be entered at T1, or the DDR can be configured to output the next sequential address (addr 3; not shown). The benefit of this approach is that twice the amount data can be output using the same clock signal, as data is output on both the rising and falling edges. In other words, for a 5 ns clock signal, two bits can be output on each data line DQx.
DDR2 improves up the technology of DDR by prefetching four bits instead of two, as illustrated in FIG. 3. DDR2 provides the benefit that the prefetched data can be presented at the data lines more quickly, allowing the clock cycle period (p) to be decreased by half (e.g., to 2.5 ns) when compared with a DDR device. Accordingly, due to the natural delay in reading the array, the first address of data occurs at the rising edge of T6 (T6r), thus establishing a read latency of 6. Moreover, the next address (addr n) cannot be presented until time T2, else the device will become “backed up” with data. In any event, using DDR2 technology, two bits can be output on each data line every 2.5 ns.
DDR3 represents the next generation of DDR technology, and essentially amounts to a further extension of DDR2. As illustrated in FIG. 4, DDR3 allows for an eight-bit prefetch and theoretically can run at an even further decreased clock cycle period (p) of 1.25 ns, although in reality the clock may run slightly slower in an actual DDR3 as will be explained momentarily. Accordingly, due to the natural delay in reading the array, the first address of data occurs at the rising edge of T12 (T12r), thus establishing a read latency of 12. Moreover, the next address (addr n) cannot be presented until a minimum time of T4.
DDR SDRAMs, such as DDR3, also have a write latency as well as a read latency. Write latency defines the number of clock cycles between presentation of an address at the address lines Ax and the actual presentation of the data at the data lines DQx to be stored at that address. Write latency is required because of the device's natural read latency. However, per user specifications, the write latency and read latency are generally not equal, but instead usually vary by one in an attempt to maximize usage of the data bus to which the data lines DQx are attached. More specifically, the write latency is preferably the read latency minus one.
In current DDR3s under development, it turns out that clock periods of 1.25 ns are difficult to achieve, although future reductions in sizes and capacitances will certainly make such clock speeds achievable in the near future. A more appropriate and slightly slower clock speed of 1.5 ns is thus currently targeted for such devices. Due to the natural 15 ns delay in reading the array as explained earlier, the read latency in a device with such a clock speed is 10 (15/1.5). Accordingly, the write latency is preferably 9 for the reasons noted above.
Thus, and as illustrated in FIG. 5, when an address (addr 1) is presented to the DDR3 at T0 for writing data to the device, it will be 9 clock cycles until the data corresponding to that address (or those addresses in sequence) is presented by the data bus to the data lines DQx. As with reading, eight bits of data are written into the device at one time, and are captured from T9r to T12f. Once captured, these eight bits are written in parallel into the array at the next fraction of the clock cycle (T13r) in response to a write array pulse 20. Again, so as not to back up the device, a new address (addr n) cannot be fed to the address lines Ax until at least T4r, similarly to what was described when reading the device and as illustrated in FIG. 4. This new address's data is captured on the data lines DQx at T13r though T16f, etc.
This DDR3 writing scheme however requires tracking the write addresses through the device. This is cumbersome, as the actual address at which the data will be stored is not needed until thirteen clock cycles later at T13r. In the prior art, the addresses were organized and flowed through the device in a series of registers, as shown in FIG. 6. Essentially, these registers 22, preferably D flip flops, but can constitute other structures for holding data as well such as latches, and in this sense “register” is used generically throughout this disclosure. The registers shift the pertinent write addresses between the address lines Ax (i.e., address bus) and the array decoders on each clock cycle. Thus, addr 1 would be present in register 220 at T0, then would propagate to register 221 at T1, to register 222 at T2, etc. At T4, the next address, if any or if changed, would be presented to the first register 220, etc. In this way, the device would have the correct address presented from register 2212 to the array decoders at T13 in time for writing to the array in accordance with the write array signal 20 (see FIG. 5).
This solution of propagating the addresses through the DDR3 is beneficial in that it is relatively simple to implement. However, it also suffers from drawbacks. First, the register structure takes up quite a bit of space on the DDR3 integrated circuit. Specifically, 13 registers are needed in the above-illustrated example, each of which is “k” bits long in accordance with the size of the write addresses that are being propagated. In a device having 17 address bits, a 17-by-13 bit space would thus be used for the register structures. Stated more generically, and assuming P equals the number of bits prefetched by the device, and WLmax equals the maximum write latency, the number of registers needed would be the value WLmax+P/2 rounded up to the next integer (or 9+8/2=13 registers in this example).
Moreover, the relatively large amount of space needed for the register structures seems particularly wasteful when it is realized that a unique address is not input at every clock cycle. As noted above, an address can be entered into the device at a minimum of every four clock cycles, and as shown in the example of FIG. 5, valid addresses are captured by the device at times T0 and T4. For the intervening time periods (T1 through T3), the device merely captures “don't care” address data, which is then propagated through the shift register structure of FIG. 6. Of course, propagating these don't care address values through the register structure is not ideal.
Furthermore, as noted above, the read latency (and thus the write latency) might ideally change depending on the clock speed to be used with the DDR3. For example, while the above example contemplates a clock speed of 1.5 ns and a read latency of 10 (write latency of 9), a 3 ns clock would require only a read latency of 5 (write latency of 4). When this is recognized, it is seen that all thirteen registers 22 are not required, and that a fewer number of registers would be more optimal for use with lower clock speeds. In short, use of lower clock speeds and lower latency values may render some of the registers 22 unnecessary, which is again wasteful. Moreover, the design of the DDR3 device, and specifically the register structure, becomes dictated by the clock speed and latency values to be used. This hampers user flexibility, as a new design would be needed for each new clock speed or latency value to be used, which again is not ideal.
Moreover, as write addresses (including “don't care” addresses) are propagated through the register structure at each clock cycle, current will be necessarily drawn by each register at each clock cycle. This excess current draw is significant, and is preferably minimized.
Accordingly, the art would be benefited by a solution to this problem, and specifically to a solution that simplifies and renders more flexible the write register structure on a DDR3 device. This disclosure provides such a solution.