The present invention relates generally to semiconductor memories and other integrated circuit devices, and is directed, more particularly, to synchronizing digital signals being transferred over buses interconnecting such devices.
Conventional computer systems include a processor (not shown) coupled to a variety of memory devices, including read-only memories (xe2x80x9cROMsxe2x80x9d) which traditionally store instructions for the processor, and a system memory to which the processor may write data and from which the processor may read data. The processor may also communicate with an external cache memory, which is generally a static random access memory (xe2x80x9cSRAMxe2x80x9d). The processor also communicates with input devices, output devices, and data storage devices.
Processors generally operate at a relatively high speed. Processors such as the Pentium III(copyright) and Pentium 4(copyright) microprocessors are currently available that operate at clock speeds of at least 400 MHz. However, the remaining components of existing computer systems, with the exception of SRAM cache, are not capable of operating at the speed of the processor. For this reason, the system memory devices, as well as the input devices, output devices, and data storage devices, are not coupled directly to the processor bus. Instead, the system memory devices are generally coupled to the processor bus through a memory controller, bus bridge or similar device, and the input devices, output devices, and data storage devices are coupled to the processor bus through a bus bridge. The memory controller allows the system memory devices to operate at a lower clock frequency that is substantially lower than the clock frequency of the processor. Similarly, the bus bridge allows the input devices, output devices, and data storage devices to operate at a substantially lower frequency. Currently, for example, a processor having a 1 GHz clock frequency may be mounted on a mother board having a 133 MHz clock frequency for controlling the system memory devices and other components.
Access to system memory is a frequent operation for the processor. The time required for the processor, operating, for example, at 1 GHz, to read data from or write data to a system memory device operating at, for example, 133 MHz, greatly slows the rate at which the processor is able to accomplish its operations. Thus, much effort has been devoted to increasing the operating speed of system memory devices.
System memory devices are generally dynamic random access memories (xe2x80x9cDRAMsxe2x80x9d). Initially, DRAMs were asynchronous and thus did not operate at even the clock speed of the motherboard. In fact, access to asynchronous DRAMs often required that wait states be generated to halt the processor until the DRAM had completed a memory transfer. However, the operating speed of asynchronous DRAMs was successfully increased through such innovations as burst and page mode DRAMs which did not require that an address be provided to the DRAM for each memory access. More recently, synchronous dynamic random access memories (xe2x80x9cSDRAMsxe2x80x9d) have been developed to allow the pipelined transfer of data at the clock speed of the motherboard. However, even SDRAMs are incapable of operating at the clock speed of currently available processors. Thus, SDRAMs cannot be connected directly to the processor bus, but instead must interface with the processor bus through a memory controller, bus bridge, or similar device. The disparity between the operating speed of the processor and the operating speed of SDRAMs continues to limit the speed at which processors may complete operations requiring access to system memory.
A solution to this operating speed disparity has been proposed in the form of a computer architecture known as a synchronous link architecture. In the synchronous link architecture, the system memory may be coupled to the processor either directly through the processor bus or through a memory controller. Rather than requiring that separate address and control signals be provided to the system memory, synchronous link memory devices receive command packets that include both control and address information. The synchronous link memory device then outputs or receives data on a data bus that may be coupled directly to the data bus portion of the processor bus.
An example of a computer system 10 using the synchronous link architecture is shown in FIG. 1. The computer system 10 includes a processor 12 having a processor bus 14 coupled through a memory controller 18 and system memory bus 23 to three packetized or synchronous link dynamic random access memory (xe2x80x9cSLDRAMxe2x80x9d) devices 16a-c. The computer system 10 also includes one or more input devices 20, such as a keypad or a mouse, coupled to the processor 12 through a bus bridge 22 and an expansion bus 24, such as an industry standard architecture (xe2x80x9cISAxe2x80x9d) bus or a peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus. The input devices 20 allow an operator or an electronic device to input data to the computer system 10. One or more output devices 30 are coupled to the processor 12 to display or otherwise output data generated by the processor 12. The output devices 30 are coupled to the processor 12 through the expansion bus 24, bus bridge 22 and processor bus 14. Examples of output devices 24 include printers and a video display units. One or more data storage devices 38 are coupled to the processor 12 through the processor bus 14, bus bridge 22, and expansion bus 24 to store data in or retrieve data from storage media (not shown). Examples of storage devices 38 and storage media include fixed disk drives floppy disk drives, tape cassettes and compact-disk read-only memory drives.
In operation, the processor 12 sends a data transfer command via the processor bus 14 to the memory controller 18, which, in turn, communicates with the memory devices 16a-c via the system memory bus 23 by sending the memory devices 16a-c command packets that contain both control and address information. Data is coupled between the memory controller 18 and the memory devices 16a-c through a data bus portion of the system memory bus 23. During a read operation, data is transferred from the SLDRAMs 16a-c over the memory bus 23 to the memory controller 18 which, in turn, transfers the data over the processor 14 to the processor 12. The processor 12 transfers write data over the processor bus 14 to the memory controller 18 which, in turn, transfers the write data over the system memory bus 23 to the SLDRAMs 16a-c. Although all the memory devices 16a-c are coupled to the same conductors of the system memory bus 23, only one memory device 16a-c at a time reads or writes data, thus avoiding bus contention on the memory bus 23. Bus contention is avoided by each of the memory devices 16a-c on the system memory 22 having a unique identifier, and the command packet contains an identifying code that selects only one of these components.
The computer system 10 also includes a number of other components and signal lines that have been omitted from FIG. 1 in the interests of brevity. For example, as explained below, the memory devices 16a-c also receive a master clock signal to provide internal timing signals, a data clock signal clocking data into and out of the memory device 16, and a FLAG signal signifying the start of a command packet.
A typical command packet CA less than 0:39 greater than  for an SLDRAM is shown in FIG. 2 and is formed by 4 packet words CA less than 0:9 greater than , each of which contains 10 bits of data. As will be explained in more detail below, each packet word CA less than 0:9 greater than  is applied on a command-address bus CA including 10 lines CA0-CA9. In FIG. 2, the four packet words CA less than 0:9 greater than  comprising a command packet CA less than 0:39 greater than  are designated PW1-PW4. The first packet word PW1 contains 7 bits of data identifying the packetized DRAM 16a-c that is the intended recipient of the command packet. As explained below, each of the packetized DRAMs is provided with a unique ID code that is compared to the 7 ID bits in the first packet word PW1. Thus, although all of the packetized DRAMs 16a-c will receive the command packet, only the packetized DRAM 16a-c having an ID code that matches the 7 ID bits of the first packet word PW1 will respond to the command packet.
The remaining 3 bits of the first packet word PW1 as well as 3 bits of the second packet word PW2 comprise a 6 bit command. Typical commands are read and write in a variety of modes, such as accesses to pages or banks of memory cells. The remaining 7 bits of the second packet word PW2 and portions of the third and fourth packet words PW3 and PW4 comprise a 20 bit address specifying a bank, row and column address for a memory transfer or the start of a multiple bit memory transfer. In one embodiment, the 20 bit address is divided into 3 bits of bank address, 10 bits of row address, and 7 bits of column address. Although the command packet shown in FIG. 2 is composed of 4 packet words PW1-PW4 each containing up to 10 bits, it will be understood that a command packet may contain a lesser or greater number of packet words, and each packet word may contain a lesser or greater number of bits.
The memory device 16a is shown in block diagram form in FIG. 3. Each of the memory devices 16a-c includes a clock generator circuit 40 that receives a command clock signal CCLK and generates a large number of other clock and timing signals to control the timing of various operations in the memory device 16a. The memory device 16a also includes a command buffer 46 and an address capture circuit 48 which receive an internal clock signal ICLK, a command packet CA less than 0:9 greater than  on a 10 bit command-address bus CA, and a terminal 52 receiving a FLAG signal. A memory controller (not shown) or other device normally transmits the command packet CA less than 0:9 greater than  to the memory device 16a in synchronism with the command clock signal CCLK. As explained above, the command packet CA less than 0:39 greater than , which generally includes four 10-bit packet words PW1-PW4, contains control and address information for each memory transfer. The FLAG signal identifies the start of a command packet, and also signals the start of an initialization sequence. The command buffer 46 receives the command packet from the command-address bus CA, and compares at least a portion of the command packet to identifying data from an ID register 56 to determine if the command packet is directed to the memory device 16a or some other memory device 16b, c. If the command buffer 46 determines that the command is directed to the memory device 16a, it then provides the command to a command decoder and sequencer 60. The command decoder and sequencer 60 generates a large number of internal control signals to control the operation of the memory device 16a during a memory transfer.
The address capture circuit 48 also receives the command packet from the command-address bus CA and outputs a 20-bit address corresponding to the address information in the command packet. The address is provided to an address sequencer 64, which generates a corresponding 3-bit bank address on bus 66, a 10-bit row address on bus 68, and a 7-bit column address on bus 70. The row and column addresses are processed by row and column address paths, as will be described in more detail below.
One of the problems of conventional DRAMs is their relatively low speed resulting from the time required to precharge and equilibrate circuitry in the DRAM array. The SLDRAM 16a shown in FIG. 3 largely avoids this problem by using a plurality of memory banks 80, in this case eight memory banks 80a-h. After a read from one bank 80a, the bank 80a can be precharged while the remaining banks 80b-h are being accessed. Each of the memory banks 80a-h receives a row address from a respective row latch/decoder/driver 82a-h. All of the row latch/decoder/drivers 82a-h receive the same row address from a predecoder 84 which, in turn, receives a row address from either a row address register 86 or a refresh counter 88 as determined by a multiplexer 90. However, only one of the row latch/decoder/drivers 82a-h is active at any one time as determined by bank control logic 94 as a function of a bank address from a bank address register 96.
The column address on bus 70 is applied to a column latch/decoder 100, which supplies I/O gating signals to an I/O gating circuit 102. The I/O gating circuit 102 interfaces with columns of the memory banks 80a-h through sense amplifiers 104. Data is coupled to or from the memory banks 80a-h through the sense amps 104 and I/O gating circuit 102 to a data path subsystem 108 which includes a read data path 110 and a write data path 112. The read data path 110 includes a read latch 120 that stores data from the I/O gating circuit 102.
In the memory device 16a shown in FIG. 3, 64 bits of data are stored in the read latch 120. The read latch then provides four 16-bit data words to an output multiplexer 122 that sequentially supplies each of the 16-bit data words to a read FIFO buffer 124. Successive 16-bit data words are clocked through the read FIFO buffer 124 in response to a clock signal RCLK generated by the clock generator 40. The FIFO buffer 124 sequentially applies the 16-bit data words to a driver circuit 128 which, in turn, applies the 16-bit data words to a data bus DQ forming part of the processor bus 14 (see FIG. 1). The FIFO buffer 124 also applies two data clock signals DCLK0 and DCLK1 to the driver circuit 128 which, in turn, applies the data clock signals DCLK0 and DCLK1 on respective data clock lines 132 and 133. The data clocks DCLK0 and DCLK1 enable a device, such as the memory controller 18, reading data on the data bus DQ to be synchronized with the data. Particular bits in the command portion of the command packet CA0-CA9 determine which of the two data clocks DCLK0 and DCLK1 is applied by the driver circuit 128. It should be noted that the data clocks DCLK0 and DCLK1 are differential clock signals, each including true and complementary signals, but for ease of explanation, only one signal for each clock is illustrated and described.
The write data path 112 includes a receiver buffer 140 coupled to the data bus 130. The receiver buffer 140 sequentially applies 16-bit data words from the data bus DQ to four input registers 142, each of which is selectively enabled by a signal from a clock generator circuit 144. The clock generator circuit 144 generates these enable signals responsive to the selected one of the data clock signals DCLK0 and DCLK1. The memory controller or processor determines which data clock DCLK0 or DCLK1 will be utilized during a write operation using the command portion of a command packet applied to the memory device 16a. As with the command clock signal CCLK and command packet, the memory controller or other device (not shown) normally transmits the data to the memory device 16a in synchronism with the selected one of the data clock signals DCLK0 and DCLK1. The clock generator 144 is programmed during initialization to adjust the timing of the clock signal applied to the input registers 142 relative to the selected one of the data clock signals DCLK0 and DCLK1 so that the input registers 142 can capture the write data at the proper times. In response to the selected data clock DCLK0 or DCLK1, the input registers 142 sequentially store four 16-bit data words and combine them into one 64-bit data word applied to a write FIFO buffer 148. The write FIFO buffer 148 is clocked by a signal from the clock generator 144 and an internal write clock WCLK to sequentially apply 64-bit write data to a write latch and driver 150. The write latch and driver 150 applies the 64-bit write data to one of the memory banks 80a-h through the I/O gating circuit 102 and the sense amplifiers 104.
As mentioned above, an important goal of the synchronous link architecture is to allow data transfer between a processor or memory controller and a memory device to occur at a significantly faster rate. However, as the rate of data transfer increases, it becomes more difficult to maintain synchronization of signals transmitted between the memory controller 18 and the memory device 16a. For example, as mentioned above, the command packet CA less than 0:39 greater than  is normally transmitted from the memory controller 18 to the memory device 16a in synchronism with the command clock signal CCLK, and the read and write data are normally transferred between the memory controller 18 and the memory device 16a in synchronism with the selected one of the data clock signals DCLK0 and DCLK1. However, because of unequal signal delays and other factors, the command packet CA less than 0:39 greater than  may not arrive at the memory device 16a in synchronism with the command clock signal CCLK, and write and read data may not arrive at the memory device 16a and memory controller 18, respectively, in synchronism with the selected one of the data clock signals DCLK0 and DCLK1. Moreover, even if these signals are actually coupled to the memory device 16a and memory controller 18 in synchronism with each other, they may loose synchronism once they are coupled to circuits within these respective devices. For example, internal signals require time to propagate to various circuitry in the memory device 16a, differences in the lengths of signal routes can cause differences in the times at which signals reach the circuitry, and differences in capacitive loading of signal lines can also cause differences in the times at which signals reach the circuitry. These differences in arrival times can become significant at high speeds of operation and eventually limit the operating speed of the memory devices 16a and memory controller 18.
The problems associated with varying arrival times are exacerbated as timing tolerances become more restricted with higher data transfer rates. For example, if the internal clock ICLK derived from the command clock CCLK does not latch each of the packet words CA less than 0:9 greater than  comprising a command packet CA less than 0:39 greater than  at the proper time, errors in the operation of the memory device may result. Similarly, data errors may result during write operations if internal signals developed responsive to the data clocks DCLK0 and DCLK1 do not latch data applied on the data bus DQ at the proper time. During read operations, data errors may likewise result if internal signals in the memory controller 18 developed responsive to the data clock signals DCLK0 and DCLK1 from the memory device 16a do not latch read data applied on the data bus DQ at the proper time. Moreover, even if these respective clocks are initially synchronized, this synchronism may be lost over time during normal operation of the memory device 16a. Loss in synchronism may result from a variety of factors, including temperature variations in the environment in which the memory device 16a is operating, variations in the supply voltage applied to the memory device, and drift in operating parameters of components within the memory device.
One skilled in the art will understand that synchronization of the clock signals CCLK, DCLK0, and DCLK1 is being used to mean the adjusting of the timing of respective internal clock signals derived from these respective external clock signals so the internal clock signals can be used to latch corresponding digital signals at the proper times. For example, the command clock signal CCLK is synchronized when the timing of the internal clock signal ICLK relative to the command clock signal CCLK causes packet words CA less than 0:9 greater than  to be latched at the proper times.
To synchronize the command clock signals CCLK and the data clock signals DCLK0 and DCLK1 during write data operations, the memory controller 18 applies a test bit pattern and (FIG. 1) places the memory device 16a in a command and write data synchronization mode. During the synchronization mode, synchronization circuitry within the memory device 16a (not shown in FIG. 3) detects the applied bit pattern, places the device in the synchronization mode, and thereafter generates the necessary control signals to control components within the memory device to synchronize the clock signals CCLK, DCLK0, and DCLK1 from the controller 18. The data clock signals DCLK0 and DCLK1 must similarly be synchronized for read operations between the memory controller 18 and memory device 16a. 
As mentioned above, an important goal of the synchronous link architecture is to allow data transfer between a processor and a memory device to occur at a significantly faster rate. It should be noted that the phrase xe2x80x9cdata transferxe2x80x9d as used herein includes all digital signals transferred to and from the memory device 16a, and thus includes signals on the CA and DQ busses as well as the FLAG signal. As the data transfer rate increases, it becomes more difficult to maintain the required timing between signals transmitted between the memory device 16a and the memory controller 18. For example, as mentioned above, the command packet CA less than 0:39 greater than  is normally transmitted to the memory device 16a in synchronization with the command clock signal CCLK, and the data is normally transmitted to the memory device 16a in synchronization with the selected one of the data clock signals DCLK0 and DCLK1. However, because of unequal signal delays and other factors, the command packet words CA less than 0:9 greater than  may not arrive at the memory device 16a in synchronization with the command clock signal CCLK, and the data packet words may not arrive at the memory device 16a in synchronization with the selected data clock signal DCLK0 or DCLK1. Moreover, even if these signals are actually coupled to the memory device 16a in synchronization with each other, this timing may be lost once they are coupled to circuits within the memory device. For example, internal signals require time to propagate to various circuitry in the memory device 16a, differences in the lengths of signal routes can cause differences in the times at which signals reach the circuitry, and differences in capacitive loading of signal lines can also cause differences in the times at which signals reach the circuitry. These differences in arrival times can become significant at high data transfer rates and eventually limit the operating speed of the packetized memory devices.
The problems associated with varying arrival times are exacerbated as timing tolerances become more restricted at higher data transfer rates. For example, if the internal clock ICLK derived from the command clock CCLK does not cause each of the packet words CA less than 0:9 greater than  comprising a command packet CA less than 0:39 greater than  to latch at the proper time, errors in the operation of the memory device may result. Thus, the timing or phase shift of the internal clock signal ICLK relative to the command clock signal CCLK must be adjusted such that the ICLK signal may be utilized to successfully latch each of the respective command signals CA less than 0 greater than -CA less than 9 greater than  comprising a packet word CA less than 0:9 greater than . This is true notwithstanding the varying arrival times of the respective command signals CA less than 0 greater than -CA less than 9 greater than  within each packet word CA less than 0:9 greater than  relative to the ICLK signal.
Thus, for each of the clock signals CCLK, DCLK0, and DCLK1 the phase shift of respective internal clock signals derived from these respective external clock signals must be adjusted so the internal clock signals can be used to latch corresponding packet words at optimum times. For example, the phase shift of the internal clock signal ICLK relative to the command clock signal CCLK must be adjusted so that all command signals CA less than 0 greater than -CA less than 9 greater than  in each packet word CA less than 0:9 greater than  are latched at the optimum time.
As the data transfer rate increases, the duration for which each signal CA less than 0 greater than -CA less than 9 greater than  in a packet word CA less than 0:9 greater than  is valid decreases by a corresponding amount, as will be understood by one skilled in the art. More specifically, the data window or xe2x80x9ceyexe2x80x9d DE for each of the DQ less than 0 greater than -DQ less than 15 greater than  signals decreases at higher data transfer rates. As understood by one skilled in the art, the data eye DE for each of the DQ less than 0 greater than -DQ less than 9 greater than  signals defines the actual duration that each signal is valid after timing skew of the signal is considered. The timing skew of the DQ less than 0 greater than -DQ less than 9 greater than  signals arises from a variety of timing errors such as loading on the lines of the DQ bus and the physical lengths of such lines. FIG. 4 is a timing diagram illustrating the data eyes DE for a number of the DQ less than 0 greater than -DQ less than 9 greater than  signals. The solid lines indicate the ideal DQ less than 0 greater than , DQ less than 1 greater than , and DQ less than 9 greater than  signals, and the dashed lines indicate the worst case potential time skew for each of these signals. The data eyes DE of the DQ less than 0 greater than , DQ less than 1 greater than , and DQ less than 9 greater than  signals are defined by time intervals t0-t3, t1-t4, and t5-t7, respectively.
As data eyes DE of the applied signals DQ less than 0 greater than -DQ less than 9 greater than  decrease at high data transfer rates, it is possible that one or more of these signals in each data packet word DQ less than 0:15 greater than  will have arrival times such that not all signals in a packet word are simultaneously valid at the memory device 16a, and thus cannot be successfully captured by the internal clock signal ICLK. For example, in FIG. 4, the data eye DE of the DQ less than 0 greater than  signal from times t0-t3 does not overlap the data eye of the DQ less than 15 greater than  signal from times t5-t7. In this situation, the signals DQ less than 0 greater than  and DQ less than 15 greater than  are not both valid at the memory device 16a at the same time so the packet word DQ less than 0:15 greater than  cannot be successfully captured responsive to the RCLK signal. The transition of the RCLK signal at time t2 could successfully capture the DQ less than 0 greater than  and DQ less than 1 greater than  signals, but not the DQ less than 15 greater than  signal, and, conversely, the transition of the RCLK signal at time t6 could successfully capture the DQ less than 15 greater than  signal but not the DQ less than 0 greater than  and DQ less than 1 greater than  signals, both of which have already gone invalid at time t6.
There is a need for synchronizing respective data clock signals and corresponding data packet signals during the transfer of read data between packetized memory devices and a memory controller. Although the foregoing discussion is directed to synchronizing clock signals in packetized memory devices like SLDRAMs, similar problems exist in other types of integrated circuits as well, including other types of memory devices.
According to one aspect of the present invention, a method adaptively adjusts respective timing offsets of a plurality of digital signals relative to a clock signal being output along with the digital signals to enable a circuit receiving the digital signals successfully to each of the digital signals responsive to the clock signal. The method includes storing in a respective storage circuit associated with each digital signal a corresponding phase command. The phase command defines a particular timing offset between the corresponding digital signal and the clock signal. The clock signal is output along with each digital signal having the timing offset defined by the corresponding phase command. The digital signals are captured responsive to the clock signal and evaluated to determine if each digital signal was successfully captured. A phase adjustment command is generated to adjust the value of each phase command. The operations of outputting the clock signal through generating a phase adjustment command are repeated for a plurality of phase adjustment commands for each digital signal. A phase command that causes the digital signal to be successfully captured is then selected, and the selected phase command is stored in the storage circuit associated with the digital signal.
According to another aspect of the present invention, a read synchronization circuit adaptively adjusts respective timing offsets of a plurality of digital signals applied on respective signal terminals and an external data clock signal to enable an external device to latch the digital signals responsive to the external data clock signal. The read synchronization circuit includes a plurality of latch circuits, each latch circuit including an input, an output coupled to a respective signal terminal, and a clock terminal. Each latch circuit stores a signal applied on the input and providing the stored signal on the signal terminal responsive to a clock signal applied on the clock terminal. A plurality of phase command registers store phase commands with each register being associated with at least one of the latch circuits.
A clock generation circuit is coupled to latch circuits and the phase command registers and generates a plurality of internal clock signals and the external data clock signal responsive to a read clock signal. Each internal clock signal and the external clock signal has a respective phase shift relative to the read clock signal. The clock generation circuit selects one of the internal clock signals for each latch circuit in response to the associated phase command and applies the selected internal clock signal to the clock terminal of the latch circuit to place digital signals on the corresponding signal terminal with a timing offset determined by the phase shift of the selected internal clock signal.
A control circuit is coupled to the clock generation circuit and the phase command registers and operates in response to a synchronization command to apply synchronization digital signals on the inputs of the latch circuits and to adjust the respective timing offsets between the external data clock signal and the synchronization digital signals output by each latch circuit by adjusting the respective values of the phase commands. The circuit stores final phase commands in each phase command register that allow the synchronization digital signals to be successfully captured responsive to the external data clock signal. The read synchronization circuit may be utilized in a variety of different types of integrated circuits, including packetized memory devices such as SLDRAMs, nonpacketized devices such as double-data-rate synchronous dynamic random access memories (DDR SDRAMs), and alternative memory architectures having alternative clocking topologies.