Computer systems use memory devices, such as dynamic random access memory (“DRAM”) devices, to store data that are accessed by a processor. These memory devices are normally used as system memory in a computer system. In a typical computer system, the processor communicates with the system memory through a processor bus and a memory controller. The memory devices of the system memory, typically arranged in memory modules having multiple memory devices, are coupled through a memory bus to the memory controller. The processor issues a memory request, which includes a memory command, such as a read command, and an address designating the location from which data or instructions are to be read. The memory controller uses the command and address to generate appropriate command signals as well as row and column addresses, which are applied to the system memory through the memory bus. In response to the commands and addresses, data are transferred between the system memory and the processor. The memory controller is often part of a system controller, which also includes bus bridge circuitry for coupling the processor bus to an expansion bus, such as a PCI bus.
In memory systems, high data bandwidth is desirable. Generally, bandwidth limitations are not related to the memory controllers since the memory controllers sequence data to and from the system memory as fast as the memory devices allow. One approach that has been taken to increase bandwidth is to increase the speed of the memory data bus coupling the memory controller to the memory devices. Thus, the same amount of information can be moved over the memory data bus in less time. However, despite increasing memory data bus speeds, a corresponding increase in bandwidth does not result. One reason for the non-linear relationship between data bus speed and bandwidth is the hardware limitations within the memory devices themselves. That is, the memory controller has to schedule all memory commands to the memory devices such that the hardware limitations are honored. Although these hardware limitations can be reduced to some degree through the design of the memory device, a compromise must be made because reducing the hardware limitations typically adds cost, power, and/or size to the memory devices, all of which are undesirable alternatives. Thus, given these constraints, although it is easy for memory devices to move “well-behaved” traffic at ever increasing rates, for example, sequel traffic to the same page of a memory device, it is much more difficult for the memory devices to resolve “badly-behaved traffic,” such as bouncing between different pages or banks of the memory device. As a result, the increase in memory data bus bandwidth does not always yield a corresponding increase in information bandwidth.
In addition to the limited bandwidth between processors and memory devices, the performance of computer systems is also limited by latency problems that increase the time required to read data from system memory devices. More specifically, when a memory device read command is coupled to a system memory device, such as a synchronous DRAM (“SDRAM”) device, the read data are output from the SDRAM device only after a delay of several clock periods. Therefore, although SDRAM devices can synchronously output burst data at a high data rate, the delay in initially providing the data can significantly slow the operating speed of a computer system using such SDRAM devices. Increasing the memory data bus speed can be used to help alleviate the latency issue. However, as with bandwidth, the increase in memory data bus speeds do not yield a linear reduction of latency, for essentially the same reasons previously discussed.
Although increasing memory data bus speed has, to some degree, been successful in increasing bandwidth and reducing latency, other issues are raised by this approach. For example, as the speed of the memory data bus increases, loading on the memory bus needs to be decreased in order to maintain signal integrity since traditionally, there has only been wire between the memory controller and the memory slots into which the memory modules are plugged. Several approaches have been taken to address the memory bus loading issue. For example, reducing the number of memory slots to limit the number of memory modules that contribute to the loading of the memory bus, adding buffer circuits on a memory module in order to provide sufficient fanout of control signals to the memory devices on the memory module, and providing multiple memory device interfaces on the memory module since there are too few memory module connectors on a single memory device interface. The effectiveness of these conventional approaches are, however, limited. A reason why these techniques were used in the past is that it was cost-effective to do so. However, when only one memory module can be plugged in per interface, it becomes too costly to add a separate memory interface for each memory slot. In other words, it pushes the system controllers package out of the commodity range and into the boutique range, thereby, greatly adding cost.
One recent approach that allows for increased memory data bus speed in a cost effective manner is the use of multiple memory devices coupled to the processor through a memory hub. A computer system 100 shown in FIG. 1 uses a memory hub architecture. The computer system 100 includes a processor 104 for performing various computing functions, such as executing specific software to perform specific calculations or tasks. The processor 104 includes a processor bus 106 that normally includes an address bus, a control bus, and a data bus. The processor bus 106 is typically coupled to cache memory 108, which, is typically static random access memory (“SRAM”). Finally, the processor bus 106 is coupled to a system controller 110, which is also sometimes referred to as a bus bridge. The system controller 110 serves as a communications path to the processor 104 for a variety of other components. For example, as shown in FIG. 1, the system controller 110 includes a graphics port that is typically coupled to a graphics controller 112, which is, in turn, coupled to a video terminal 114. The system controller 110 is also coupled to one or more input devices 118, such as a keyboard or a mouse, to allow an operator to interface with the computer system 100. Typically, the computer system 100 also includes one or more output devices 120, such as a printer, coupled to the processor 104 through the system controller 110. One or more data storage devices 124 are also typically coupled to the processor 104 through the system controller 110 to allow the processor 104 to store data or retrieve data from internal or external storage media (not shown). Examples of typical storage devices 124 include hard and floppy disks, tape cassettes, and compact disk read-only memories (CD-ROMs).
The system controller 110 includes a memory hub controller 128 that is coupled to the processor 104. The system controller 110 is further coupled over a high speed bi-directional or unidirectional system controller/hub interface 134 to several memory modules 130a-n. As shown in FIG. 1, the controller/hub interface 134 includes a downstream bus 154 and an upstream bus 156 which are used to couple data, address, and/or control signals away from or toward, respectively, the memory hub controller 128. Typically, the memory modules 130a-n are coupled in a point-to-point or daisy chain architecture such that the memory modules 130a-n are connected one to another in series. Thus, the system controller 110 is coupled to a first memory module 130a, with the first memory module 130a connected to a second memory module 130b, and the second memory module 130b coupled to a third memory module 130c, and so on in a daisy chain fashion. Each memory module 130a-n includes a memory hub 140 that is coupled to the system controller/hub interface 134, and is further coupled a number of memory devices 148 through command, address and data buses, collectively shown as local memory bus 150. The memory hub 140 efficiently routes memory requests and responses between the memory hub controller 128 and the memory devices 148.
The memory devices 148 on the memory modules 130a-n are typically capable of operating at high clock frequencies in order to facilitate the relatively high speed operation of the overall memory system. Consequently, computer systems employing this architecture can also use the high-speed system controller/hub interface 134 to complement the high clock speeds of the memory devices 148. Additionally, with a memory hub based system, signal integrity can be maintained on the system controller/hub interface 134 since the signals are typically transmitted through multiple memory hubs 140 to and from the memory hub controller 128. Moreover, this architecture also provides for easy expansion of the system memory without concern for degradation in signal quality as more memory modules are added, such as occurs in conventional memory bus architectures.
Although the memory hub architecture shown in FIG. 1 provides improved memory system performance, the design of the hub memory system, and more generally, computer systems including such a memory hub architecture, becomes increasingly difficult. For example, in many hub based memory systems, the processor is coupled through a memory hub controller to each of several memory hubs via a high speed bus or link over which signals, such as command, address, or data signals, are transferred at a very high rate. The memory hubs are, in turn, coupled to several memory devices via buses that must also operate at a very high speed. However, as transfer rates increase, the time for which a signal represents valid information is decreasing. As commonly referenced by those ordinarily skilled in the art, the window or “eye” for when the signals are valid decreases at higher transfer rates. With specific reference to data signals, the “data eye” decreases. As understood by one skilled in the art, the data eye for each of the data signals defines the actual duration that each signal is valid after various factors affecting the signal are considered, such as timing skew, voltage and current drive capability, and the like. In the case of timing skew of signals, it often arises from a variety of timing errors such as loading on the lines of the bus, the physical lengths of such lines, and drifting operating conditions.
One approach to alleviating timing problems in memory devices is to use a delay-locked loop (DLL) to lock or align the receipt of read data from a memory device and a capture strobe signal used to latch the read data in a memory hub. More specifically, a read strobe signal is output by the memory devices along with read data signals. Although the timing relationship between the read strobe signal and the read data is generally fixed, the timing of when the read strobe signal and the read data are provided by the memory device to a memory hub may slowly drift in relation to a core clock domain used to synchronize operation of the memory hub and the memory device. The timing may slowly drift due to variations in the operating conditions, such as increasing operating temperature or voltage variations. In such case, the read strobe signal and read data may not be present in the memory hub at the proper time. To alleviate this problem, a DLL included in the memory device is used to maintain synchronization of the operation of the memory device and the memory controller. This is accomplished by the memory device by aligning its output strobe to an input clock signal that is sourced from the memory controller or provided by a common clock signal sourced to the memory controller and memory device. That is, as the timing between the memory device and memory hub begins to drift, the DLL can adjust the timing of internal clock signals of the memory device relative to the core clock signal thereby “re-synchronizing” operation of the memory device and the memory hub. The DLL is thus effective in preventing substantial drifting of the read data strobe and the read data in relation to the core clock domain. As transfer rates increase, however, the timing specifications for the DLL become more stringent and therefore increasingly difficult to meet. DLL circuitry sufficient to accommodate such timing needs often consume substantial power as well. Furthermore, the amount of circuitry required to implement a suitable DLL can materially reduce the amount of space that could otherwise be used for memory device circuitry, thereby either increasing the cost or reducing the storage capacity of such memory devices.
There is accordingly a need for a system and method that avoids the need to precisely control the timing relationships between a memory hub clock domain and the receipt of read data signals at the memory hub in a manner that avoids the need for extensive DLL or DL circuitry.