Not applicable.
Not applicable.
1. Field of the Invention
The present invention generally relates to a computer system comprising a plurality of memory modules. More particularly, the invention relates to the elimination of etch-related skew resulting from clock signal fanout across multiple modules.
2. Background of the Invention
It often is desirable to include multiple processors in a single computer system. This is especially true for computationally intensive applications and applications that otherwise can benefit from having more than one processor simultaneously performing various tasks. It is not uncommon for a multi-processor system to have 2 or 4 or more processors working in concert with one another. Typically, each processor couples to at least one and perhaps three or four other processors. To further improve performance in multi-processor systems, system designers may implement a distributed memory system. In such a system, each processor is coupled to one or more memory devices, with every processor in the system capable of accessing data from any of the memory locations.
Many modern multi-processor systems rely on a core logic chipset to direct data traffic between processors, memory, and the outside world. A conventional core logic chipset includes, among other things, a memory controller and I/O interface circuitry. Older chipsets would also control cache memory, but newer designs are delegating this role to the processors to which the cache memories are connected. Modern core logic chipsets include a number of devices, each capable of transmitting data to and from processors or memory devices. For example, the Compaq 21264 Alpha processor has employed a core logic chipset that includes ASIC chips capable of fetching and transmitting 256-bit data bundles to and from SDRAM memory arrays. High-performance Alpha systems have support for up to 32 GB or more of main memory.
The physical implementation of large memories requires a large number of memory boards and module space. To conserve space, systems with large memories are usually built using multiple memory boards that connect to a main system board. This is done to take advantage of design space in three dimensions, thus yielding a smaller physical space. In addition to occupying a large physical space, large memories also present a large fanout and large load to the clock system. Fanout refers to the distribution of a clock signal, which often originates from a common clock source, to every CPU, ASIC, and memory device in the chipset. As more memory devices, namely memory boards, are added to the system, the load on the clock source becomes greater and fanout increases as well.
Another disadvantage that arises from adding memory boards to a computer system is that clock skew becomes more difficult to manage. Skew relates to the phase and timing misalignment of the clock signal as it is received at the numerous destination devices. Ideally, the clock transitions at the various devices occur at the same time or within a specified range of time to ensure synchronous, efficient operation of the system. One of the major contributors to skew is interconnect propagation delay. Skew between the clock signals arriving at two devices increases as the difference in distance between the clock source and these devices increases. Thus, if a memory device is physically located farther from a clock source than a CPU, the clock signal will reach the CPU before reaching the memory device and skew will result. If all the devices are located on the same layer of a printed wiring board (PWB), skew may be corrected by ensuring clock etch runs are equal in length. However, as discussed above, modem systems are configured with multiple memory boards and these memory boards are typically configured to accept several memory modules themselves. In such a system, the clock signals must travel across multiple printed wiring boards (PWBs) (e.g., system board, memory board, memory module) before reaching the destination device.
FIG. 1, which shows a conventional multi-processor system with multiple memory boards 160 and Dual Inline Memory Modules (DIMMs) 170, graphically depicts this clock fanout problem. The system shown in FIG. 1 includes a system board 100, on which the CPUs 110 and core logic chips 120 are assembled. Also included on the system board is a frequency synthesizer 130 or other clock source. From this clock source, the clock signals must be fanned out to the various devices. Fanout devices 140, such as clock buffers or PLL clock drivers, are used to reproduce and distribute the incoming clock source to the various destination devices. It should be noted that FIG. 1 represents clock signals only and does not include data, command, or address paths between devices.
As discussed above, skew tends to be more problematic when clock signals are routed across multiple PWBs. Not only is there skew between the devices on the system board 100 and the individual memory devices 150, but there is also skew between memory devices 150 on different DIMMs 170. Even if clock signal trace lengths can be matched to all the memory devices 150 in the system, there is a non-negligible amount of variation in the propagation constants for the different PWBs in the signal paths. The propagation constant for any given board provides a measure of the clock delay induced as a function of the total length of clock etch on that board. This propagation constant may vary by as much as xc2x110% from board to board. Thus, even if identical clock traces are etched onto each of the multiple memory boards 160, a skew of up to 20 percent between the boards 160 may result. The same is true for the DIMMs 170, which are industry standard devices manufactured to a common specification.
In terms of actual numbers, the xc2x110% variation in propagation constant results in a possible difference of roughly 40 picoseconds per inch of clock etch between printed wiring boards. If two clock signals have to travel 30 inches from source to destination, and are routed such that they have no routing layer in common, an interconnect skew of up to 1.2 nanoseconds develops between memory devices 150 on different DIMMs 170. This interconnect skew is added to the total skew from all contributors, part of which is developed by the electrical components used to generate the clock. Given that current processor clock speeds are increasing well beyond 100 and 200 MHz (i.e., 10 nsec and 5 nsec clock periods), this skew represents a large percentage of the clock period during which commands are executed. The problem naturally gets worse as clock frequencies increase. In general, it is desirable to limit the total of all skew contributors to less than 20% of the overall clock period to improve system performance.
An additional problem arises when different clock voltages are required at the various destination devices. For example, conventional DIMMs 170 use TTL voltage inputs for their source clock while certain logic devices 120 or processors 110 use PECL voltage inputs for their source clock. TTL signals typically oscillate between nominal voltages of 0 and 3.3 volts. PECL signals, on the other hand, oscillate between 1.5 volts and 2.5 volts. In each case, the lower voltage represents a binary zero and the higher voltage represents a binary one. In order to successfully use devices with different input voltage requirements, translators are used to convert one signal type to another. The translator may be a PLL clock driver that distributes and translates the clock signal voltages. In general, a TTL clock will yield larger skews than a PECL clock because of the large switching region of the TTL logic. While the rest of the chipset 300 can benefit from the low skew PECL clocks, the clocks to the memory devices 150 must be translated from PECL to TTL voltage levels. Additionally, the insertion of a translator in the clock signal paths injects additional delay to the clock system. An improved clock distribution system will preferably allow system designers to deliver PECL voltage signals to memory DIMMs to reduce signal-induced skew and eliminate the skew that is generated by a translator that is normally required to convert the clock signal to TTL voltage levels.
It is desirable therefore, to develop a clock distribution scheme that successfully eliminates skew that results from differences in clock trace lengths and also from differences in PWB signal propagation constants. The clock distribution system also preferably permits PECL voltage DIMMs. Implementation of the clock distribution scheme may advantageously allow reliable data transfer between devices while minimizing latency and skew and maximizing bandwidth. The transmission scheme may also indirectly improve the manufacturability of printed wiring boards and memory hardware by easing the requirements for equal-length clock paths.
The problems noted above are solved in large part by a clock distribution scheme for use in a system comprising a plurality of memory devices. The distribution scheme may be implemented in a computer processor system comprising a system board on which a processor, at least one memory logic controller, and a clock source are installed. The system also includes a memory module, or DIMM, on which at least one memory device and one PLL clock driver are installed. The system board is configured to accept one or more DIMMs. The clock signal generated by the clock source on the system board is distributed to the various devices on the system board by a clock buffer tree. The clock signal etch runs leading to each of the devices are preferably of equal length. The same clock signal is also propagated via a different length etch to the memory device on the DIMM. Clock skew generated by these different clock etch lengths is removed by routing the feedback loop of the clock driver from the DIMM to the system board and back to the clock driver on the DIMM. The total length of etch for the clock driver feedback loop is substantially equal to the difference in length between the clock etch leading to the devices on the system board and the etch leading to the memory device on the DIMM. The portion of the feedback loop added to the DIMM is substantially equal to the length of clock signal etch on the DIMM leading to the memory device.
The balance of the feedback loop etch is added to the system board for two reasons. First, the skew caused by any difference in the clock signal path lengths leading up to the memory module must be eliminated. Second, the feedback loop is routed to the system board so that the feedback loop experiences the same propagation delay for this portion of the loop as the clock signal leading up to the memory module.
Additionally, the phase-locked loop clock driver on the memory module performs a clock signal voltage translation from PECL to TTL voltage. This allows the clock signals to remain at PECL voltage levels through the transition to the memory module.
The clock distribution scheme may be extended to multiple boards and need not be limited to memory clock distribution systems.