Conventional computer systems include a processor (not shown) coupled to a variety of memory devices, including read-only memories ("ROMs") which traditionally store instructions for the processor, and a system memory to which the processor may write data and from which the processor may read data. The processor may also communicate with an external cache memory, which is generally a static random access memory ("SRAM"). The processor also communicates with input devices, output devices, and data storage devices.
Processors generally operate at a relatively high speed. Processors such as the Pentium.RTM. and Pentium Pro.RTM. microprocessors are currently available that operate at clock speeds of at least 200 MHz. However, the remaining components of the computer system, with the exception of SRAM cache, are not capable of operating at the speed of the processor. For this reason, the system memory devices, as well as the input devices, output devices, and data storage devices, are not coupled directly to the processor bus. Instead, the system memory devices are generally coupled to the processor bus through a memory controller, and the input devices, output devices, and data storage devices are coupled to the processor bus through a bus bridge. The memory controller allows the system memory devices to operate at a lower clock frequency that is substantially lower than the clock frequency of the processor. Similarly, the bus bridge allows the input devices, output devices, and data storage devices to operate at a substantially lower frequency. Currently, for example, a processor having a 200 MHz clock frequency may be mounted on a mother board having a 66 MHz clock frequency for controlling the system memory devices and other components.
Access to system memory is a frequent operation for the processor. The time required for the processor, operating, for example, at 200 MHz, to read data from or write data to a system memory device operating at, for example, 66 MHz, greatly slows the rate at which the processor is able to accomplish its operations. Thus, much effort has been devoted to increasing the operating speed of system memory devices.
System memory devices are generally dynamic random access memories ("DRAMs"). Initially, DRAMs were asynchronous and thus did not operate at even the clock speed of the motherboard. In fact, access to asynchronous DRAMs often required that wait states be generated to halt the processor until the DRAM had completed a memory transfer. However, the operating speed of asynchronous DRAMs was successfully increased through such innovations as burst and page mode DRAMs which did not require that an address be provided to the DRAM for each memory access. More recently, synchronous dynamic random access memories ("SDRAMs") have been developed to allow the pipelined transfer of data at the clock speed of the motherboard. However, even SDRAMs are incapable of operating at the clock speed of currently available processors. Thus, SDRAMs cannot be connected directly to the processor bus, but instead must interface with the processor bus through a memory controller, bus bridge, or similar device. The disparity between the operating speed of the processor and the operating speed of SDRAMs continues to limit the speed at which processors may complete operations requiring access to system memory.
A solution to this operating speed disparity has been proposed in the form of a computer architecture known as "SyncLink." In the SyncLink architecture, the system memory is coupled to the processor either directly through the processor bus or through a memory controller (not shown). Rather than requiring that separate address and control signals be provided to the system memory, SyncLink memory devices receive command packets that include both control and address information. The SyncLink memory device then outputs or receives data on a data bus that may be coupled directly to the data bus portion of the processor bus.
An example of a SyncLink memory device 16 is shown in block diagram form in FIG. 1. The memory device 16 includes a clock divider and delay circuit 40 that receives a master clock signal 42 and generates a large number of other clock and timing signals to control the timing of various operations in the memory device 16. The memory device 16 also includes a command buffer 46 and an address capture circuit 48 which receive an internal clock CLK signal, a command packet CA0-CA9 on a command bus 50, and a FLAG signal on line 52. The command packet CA0-CA9 contains control and address information for each memory transfer, and the FLAG signal identifies the start of a command packet which may include more than one 10-bit packet word. In fact, a command packet is generally in the form of a sequence of 10-bit packet words on the 10-bit command bus 50. The command buffer 46 receives the command packet from the bus 50, and compares at least a portion of the command packet to identifying data from an ID register 56 to determine if the command packet is directed to the memory device 16 or some other memory device 16 in the event multiple memory devices 16 are used together in a system. If the command buffer determines that the command is directed to the memory device 16, it then provides a command word to a command decoder and sequencer 60. The command decoder and sequencer 60 generates a large number of internal control signals to control the operation of the memory device 16 during a memory transfer.
The address capture circuit 48 also receives the command words from the command bus 50 and outputs a 20-bit address corresponding to the address information in the command. The address is provided to an address sequencer 64 which generates a corresponding 3-bit bank address on bus 66, a 10-bit row address on bus 68, and a 7-bit column address on bus 70.
One of the problems of conventional DRAMs is their relatively low speed resulting from the time required to precharge and equilibrate circuitry in the DRAM array. The SyncLink memory device 16 shown in FIG. 1 largely avoids this problem by using a plurality of memory banks 80, in this case eight memory banks 80a-h. After a memory read from one bank 80a, the bank 80a can be precharged while the remaining banks 80b-h are being accessed. Each of the memory banks 80a-h receive a row address from a respective row latch/decoder/driver 82a-h. All of the row latch/decoder/drivers 82a-h receive the same row address from a predecoder 84 which, in turn, receives a row address from either a row address register 86 or a refresh counter 88 as determined by a multiplexer 90. However, only one of the row latch/decoder/drivers 82a-h is active at any one time as determined by bank control logic 94 as a function of bank data from a bank address register 96.
The column address on bus 70 is applied to a column latch/decoder 100 which, in turn, supplies I/O gating signals to an I/O gating circuit 102. The I/O gating circuit 102 interfaces with columns of the memory banks 80a-h through sense amplifiers 104. Data is coupled to or from the memory banks 8a-h through the sense amps 104 and I/O gating circuit 102 to a data path subsystem 108 which includes a read data path 110 and a write data path 112. The read data path 110 includes a read latch 120 receiving and storing data from the I/O gating circuit 102. In the memory device 16 shown in FIG. 1, 64 bits of data are applied to and stored in the read latch 120. The read latch then provides four 16-bit data words to a multiplexer 122. The multiplexer 122 sequentially applies each of the 16-bit data words to a read FIFO buffer 124. Successive 16-bit data words are clocked through the FIFO buffer 124 by a clock signal generated from an internal clock by a programmable delay circuit 126. The FIFO buffer 124 sequentially applies the 16-bit words and two clock signals (a clock signal and a quadrature clock signal) to a driver circuit 128 which, in turn, applies the 16-bit data words to a data bus 130 forming part of the processor bus 14. The driver circuit 128 also applies the clock signals to a clock bus 132 so that a device such as the processor 12 reading the data on the data bus 130 can be synchronized with the data.
The write data path 112 includes a receiver buffer 140 coupled to the data bus 130. The receiver buffer 140 sequentially applies 16-bit words from the data bus 130 to four input registers 142, each of which is selectively enabled by a signal from a clock generator circuit 144. Thus, the input registers 142 sequentially store four 16-bit data words and combine them into one 64-bit data word applied to a write FIFO buffer 148. The write FIFO buffer 148 is clocked by a signal from the clock generator 144 and an internal write clock WCLK to sequentially apply 64-bit write data to a write latch and driver 150. The write latch and driver 150 applies the 64-bit write data to one of the memory banks 80a-h through the I/O gating circuit 102 and the sense amplifier 104.
As mentioned above, an important goal of the SyncLink memory device architecture is to allow data transfer between a processor and a memory device to occur at a significantly faster rate. Faster data transfer can be achieved by "pipelining" the transfer of data in synchronism with a clock signal. The rate of data transfer is then controlled by the frequency of the clock signal. Typically, a bit of data is clocked into or out of the memory device on each rising edge of the clock signal. However, faster data transfer can be achieved by clocking data into or out of the memory device on each transition of the clock signal, i.e. on both the rising and falling edges of the clock signal. As explained in detail below, clocking data on both edges of the clock, known as "double data rate" clocking, is generally achieved by driving various circuits with both a first clock signal, CLK, and a quadrature clock signal, CLK90, that is delayed by 90 degrees from the first clock signal. However, at higher clock speeds needed to achieve higher data transfer rates, it can be difficult to obtain these quadrature clock signals CLK and CLK90. Moreover, generating quadrature clock signals typically requires a substantial amount of circuitry, thus consuming a significant area of a semiconductor chip.
The traditional approach to obtaining quadrature clock signals is to use a clock generator 154 of the type shown in FIG. 2. The clock generator 154 includes a first J-K flip flop 156 that generates the CLK signal and a second J-K flip-flop 158 that generates the CLK90 signal. As shown in the timing diagram of FIG. 3, a clock signal CLKA having twice the frequency of the quadrature clock signals CLK and CLK90 is applied to the clock input of the first flip-flop 156. The J and K inputs of the first flip-flop 156 are coupled to a logic "1" voltage level so that the first flip-flop 156 toggles, as also shown in FIG. 3. Thus, the CLK signal generated at the Q output of the flip-flop 156 transitions on each rising edge of the CLKA signal.
The second J-K flip-flop 158 is clocked by the inverse of the CLKA signal, which is generated at the output of an inverter 160 that receives the CLKA signal. The second flip-flop 158 is also configured to toggle since its J and K inputs are coupled to a logic "1" voltage level. Thus, as shown in FIG. 3, the CLK90 signal generated at the Q output of the flip-flop 158 transitions on each falling edge of the CLKA signal.
There are two primary disadvantages of the conventional approach shown in FIG. 2. First, each J-K flip-flop 156, 158 requires a great deal of circuitry to implement, thus increasing the size, complexity, and expense of memories and other devices using the J-K flip-flops 156, 158 to generate quadrature clock signals. Second, the use of a clock generator 154 of the type shown in FIG. 2 requires an input clock signal having twice the frequency of the quadrature clock signals CLK and CLK90 to toggle the flip flops 156, 158. However, as the speed of memories and other devices requiring quadrature clock signals increase, it is increasingly difficult to couple a clock signal having such an extremely high frequency to a clock generator.
Although the foregoing discussion is directed to the need for improved clock generators used in command buffers of packetized DRAMs. similar problems exist in other memory devices, such as synchronous DRAMs, which must process control and other signals at a high rate of speed, as well as in other devices in which the operation of the device is synchronized to quadrature clock signals. Therefore, there is a need for a multi-phase clock generator that is relatively simple and yet does not require an input clock signal having twice the frequency of the clock signals provided by the clock generator.