Conventional computer systems include a processor (not shown) coupled to a variety of memory devices, including read-only memories ("ROMs") which traditionally store instructions for the processor, and a system memory to which the processor may write data and from which the processor may read data. The processor may also communicate with an external cache memory, which is generally a static random access memory ("SRAM"). The processor also communicates with input devices, output devices, and data storage devices.
Processors generally operate at a relatively high speed. Processors such as the Pentium.RTM. and Pentium II.RTM. microprocessors are currently available that operate at clock speeds of at least 400 MHz. However, the remaining components of existing computer systems, with the exception of SRAM cache memory, are not capable of operating at the speed of the processor. For this reason, the system memory devices, as well as the input devices, output devices, and data storage devices, are not coupled directly to the processor bus. Instead, the system memory devices are generally coupled to the processor bus through a memory controller, bus bridge or similar device, and the input devices, output devices, and data storage devices are coupled to the processor bus through a bus bridge. The memory controller allows the system memory devices to operate at a clock frequency that is substantially lower than the clock frequency of the processor. Similarly, the bus bridge allows the input devices, output devices, and data storage devices to operate at a frequency that is substantially lower than the clock frequency of the processor. Currently, for example, a processor having a 300 MHz clock frequency may be mounted on a mother board having a 66 MHz clock frequency for controlling the system memory devices and other components.
Access to system memory is a frequent operation for the processor. The time required for the processor, operating, for example at 300 MHz, to read data from or write data to a system memory device operating at, for example, 66 MHz, greatly slows the rate at which the processor is able to accomplish its operations. Thus, much effort has been devoted to increasing the operating speed of system memory devices.
System memory devices are generally dynamic random access memories ("DRAMs"). Initially, DRAMs were asynchronous and thus did not operate at even the clock speed of the motherboard. In fact, access to asynchronous DRAMs often required that wait states be generated to halt the processor until the DRAM had completed a memory transfer. However, the operating speed of asynchronous DRAMs was successfully increased through such innovations as burst and page mode DRAMs, which did not require that an address be provided to the DRAM for each memory access. More recently, synchronous dynamic random access memories ("SDRAMs") have been developed to allow the pipelined transfer of data at the clock speed of the motherboard. However, even SDRAMs are typically incapable of operating at the clock speed of currently available processors. Thus, SDRAMs cannot be connected directly to the processor bus, but instead must interface with the processor bus through a memory controller, bus bridge, or similar device. The disparity between the operating speed of the processor and the operating speed of SDRAMs continues to limit the speed at which processors may complete operations requiring access to system memory.
A solution to this operating speed disparity has been proposed in the form of a packetized memory device known as a SLDRAM memory device. In the SLDRAM architecture, the system memory may be coupled to the processor, either directly through the processor bus or through a memory controller. Rather than requiring that separate address and control signals be provided to the system memory, SLDRAM memory devices receive command packets that include both control and address information. The SLDRAM memory device then outputs or receives data on a data bus that may be coupled directly to the data bus portion of the processor bus.
An example of such a SLDRAM memory device is shown in FIG. 1. The memory device 30 includes a clock generator circuit 40 that receives a command clock signal CMDCLK and generates an internal clock signal ICLK and a large number of other clock and timing signals to control the timing of various operations in the memory device 30. The memory device 30 also includes a command buffer 46 and an address capture circuit 48, which receive the internal clock signal ICLK, a command packet CA0-CA9 on a 10-bit command bus 50, and a FLAG signal on line 52. A memory controller (not shown) or other device normally transmits the command packet CA0-CA9 to the memory device 30 in synchronism with the command clock signal CMDCLK. As explained above, the command packet, which generally includes four 10-bit packet words, contains control and address information for each memory transfer. The FLAG signal identifies the start of a command packet, and it also signals the start of an initialization sequence. The command buffer 46 receives the command packet from the bus 50, and compares at least a portion of the command packet to identifying data from an ID register 56 to determine if the command packet is directed to the memory device 30 or some other memory device (not shown). If the command buffer 46 determines that the command packet is directed to the memory device 30, it then provides the command words to a command decoder and sequencer 60. The command decoder and sequencer 60 generates a large number of internal control signals to control the operation of the memory device 30 during a memory transfer.
The address capture circuit 48 also receives the command words from the command bus 50 and outputs a 20-bit address corresponding to the address information in the command packet. The address is provided to an address sequencer 64, which generates a corresponding 3-bit bank address on bus 66, a 10-bit row address on bus 68, and a 7-bit column address on bus 70. The column address and row address are processed by column and row address paths 73, 75 as will be described below.
One of the problems of conventional DRAMs is their relatively low speed resulting from the time required to precharge and equilibrate circuitry in the DRAM array. The packetized DRAM 30 shown in FIG. 1 largely avoids this problem by using a plurality of memory banks 80, in this case eight memory banks 80a-h. After a read from one bank 80a, the bank 80a can be precharged while the remaining banks 80b-h are being accessed. Each of the memory banks 80a-h receives a row address from a respective row latch/decoder/driver 82a-h. All of the row latch/decoder/drivers 82a-h receive the same row address from a predecoder 84 which, in turn, receives a row address from either a row address register 86, redundant row circuit 87, or a refresh counter 88, as determined by a multiplexer 90. However, only one of the row latch/decoder/drivers 82a-h is active at any one time, as determined by bank control logic 94 as a function of a bank address from a bank address register 96.
The column address on bus 70 is applied to a column latch/decoder 100, which supplies I/O gating signals to an I/O gating circuit 102. The I/O gating circuit 102 interfaces with columns of the memory banks 80a-h through sense amplifiers 104. Data is coupled to or from the memory banks 80a-h through the sense amplifiers 104 and the I/O gating circuit 102 and a data path subsystem 108, which includes a read data path 110 and a write data path 112. The read data path 110 includes a read latch 120 that stores data from the I/O gating circuit 102. In the memory device 30 shown in FIG. 3, 64 bits of data are stored in the read latch 120. The read latch 120 then provides four 16-bit data words to an output multiplexer 122 that sequentially supplies each of the 16-bit data words to a read FIFO buffer 124. Successive 16-bit data words are clocked into the read FIFO buffer 124 by a clock signal DCLK generated by the clock generator 40. The 16-bit data words are then clocked out of the read FIFO buffer 124 by a clock signal RCLK obtained by coupling the DCLK signal through a programmable delay circuit 126. The read FIFO buffer 124 sequentially applies the 16-bit data words to a driver circuit 128 in synchronism with the RCLK signal. The driver circuit, in turn, applies the 16-bit data words to a data bus 130. The driver circuit 128 also applies the data clock signal DCLK to a clock line 132. The programmable delay circuit 126 is programmed during initialization of the memory device so that the DCLK signal has the optimum phase relative to DCLK signal for the DCLK signal to clock the read data into the memory controller (not shown), processor, or other device.
The write data path 112 includes a receiver buffer 140 coupled to the data bus 130. The receiver buffer 140 sequentially applies 16-bit words from the data bus 130 to four input registers 142, each of which is selectively enabled by a signal from a clock generator circuit 144. The clock generator circuit generates these enable signals responsive to the data clock DCLK, which, for write operations, is applied to the memory device 30 on line 132 from the memory controller, processor, or other device. As with the command clock signal CMDCLK and command packet CA0-CA9, the memory controller or other device (not shown) normally transmits the data to the memory device 30 in synchronism with the data clock signal DCLK. The clock generator 144 is programmed during initialization to adjust the timing of the clock signal applied to the input registers 142 relative to the DCLK signal so that the input registers can capture the write data at the proper times. Thus, the input registers 142 sequentially store four 16-bit data words and combine them into one 64-bit data word applied to a write FIFO buffer 148. The data are clocked into the write FIFO buffer 148 by a clock signal from the clock generator 144, and the data are clocked out of the write FIFO buffer 148 by an internal write clock WCLK signal. The WCLK signal is generated by the clock generator 40. The 64-bit write data are applied to a write latch and driver 150. The write latch and driver 150 applies the 64-bit write data to one of the memory banks 80a-h through the I/O gating circuit 102 and the sense amplifiers 104.
The command buffer 46 is shown in greater detail in the block diagram of FIG. 2. With reference to FIG. 2, a command packet consisting of a plurality of packet words are applied to a shift register 172 via the command bus 50. The shift register 172 sequentially receives packet words responsive to a clock signal CLK. The shift register 172 has N stages, each of which has a width of M bits. Therefore, each command word can be M*N bits. After an M*N bit command word has been shifted into the shift register 172, the control circuit 174 generates a LOAD signal that is applied to a storage register 178. The storage register 178 then loads all of the data stored in the shift register 172.
After the storage register 178 has been loaded, it continuously outputs the M*N bit command word to a decoder 180, an ID register 182, and a compare circuit 184. The storage register 178 also outputs the command word on a bus 190, and the compare circuit generates a CHPSEL signal. As explained below, the CHPSEL signal, when active high, causes the memory device 30 containing the command buffer 46 to perform a function corresponding to the command word on the bus 190.
The function of the decoder 180, ID register 182, and comparator 184 is to examine the command word and determine whether the command word is intended for the memory device 30 containing the command buffer 46. If the command word is directed to the memory device 30, the comparator 184 generates an active CHPSEL signal which causes the memory device 30 to carry out the operation corresponding to the command word on the bus 190. Significantly, when the memory device 30 is carrying out that command, the next packet words are being shifted into the shift register 172. Thus, the memory device 30 containing the command buffer 46 is capable of continuously receiving and processing command words.
It will be understood that necessary portions of the command buffer 46 have been omitted from FIG. 2 in the interests of brevity since they are somewhat peripheral to the claimed invention. For example, the command buffer 46 will contain circuitry for pipelining command words output from the storage register 178, circuitry for generating lower level command signals from the command word, etc.
One consideration limiting the maximum rate at which the command buffer 46 can receive and provide command packets is the speed at which the plurality of shift registers included in the shift register 172 can shift data. A conventional shift register generally consists of flip-flops and gates that control the shift operation. The conventional shift register shifts data in response to a clock pulse, and has a throughput limited to the speed of the clock signal. Increasing the clock speed will increase the throughput of the shift register. However, this approach does not increase the throughput of the conventional shift register with respect to other memory circuits also operating according to the clock signal.
One approach to increasing throughput has been to use a shift register that shifts data on both the rising and falling edges of a clock signal. The result is a dual-edge shift register that can essentially shift data at twice the throughput of conventional shift registers that shift data in response to only one clock edge or one clock pulse.
The dual-edge shift register generally requires a series of clock signals to perform the faster rate of shifting and latching operations. For example, it may be necessary to provide both of a clock signal to the dual-edge shift register in order to alternatively shift and latch data in the shift register. However, the maximum rate at which the dual-edge shift register can accurately perform the shift and latch operations may be limited by the quality (i.e. symmetry) of the clock complementary signals generated for use by the shift register.
The conventional manner in which a series of complementary clock signals are generated involve inverting a clock signal through an inverter circuit. The output of the inverter circuit is the complementary clock signal provided to the dual-edge shift register. However, when generating the complementary clock signal in such a manner, the resulting complementary clock signal will be skewed from the original clock signal due to the propagation delay of the inverter circuit. In some instances, the complementary clock signal may be skewed by as much as 50 picoseconds.
Applying the and skewed complementary clock signals to the dual-edge shift register causes the duty cycles of the shifting and latching operations to be imbalanced. Consequently, as the clock rate increases, the likelihood of the shift register misshifting or latching erroneous data also increases. Although the time delay between the and complementary clock signals may be acceptable at current clock speeds, it may pose a problem for the next generation of faster memory systems. These problems associated with an imbalanced shift register will manifest themselves as system memory errors. Therefore, there is a need for a bit shifting circuit that has a high throughput and balanced duty cycles.