Bandwidth and capacity of memory systems based on commodity dual-inline memory modules (DIMM) are severely limited by the parallel stub bus between the modules and the memory controller. In order to maintain signal integrity, the maximum number of DIMMs per channel had to be reduced with the market entrance of every new dynamic random access memory (DRAM) generation. Fully buffered DIMMs (FBDIMM) eliminate this limitation by replacing the parallel stub bus by serial, point-to-point links with a repeater device (an advanced memory buffer (AMB)) residing on every FBDIMM. While solving the bandwidth-capacity problem, FBDIMM systems potentially increase the memory latency. Keeping the pass-through latency below 3 ns, combined with careful command sequencing may alleviate the latency problem, as described in B. Ganesh et al., “Fully-Buffered DIMM Memory Architectures: Understanding, Mechanisms, Overhead and Scalings”, IEEE Int. Symp. On High Performance Computer Architecture, pp. 109-120, February 2007. The main barrier for the wide acceptance of FBDIMM however, remains the high power consumption of the AMB. Current AMBs tend to consume more than 8W, see, for example, Intel Corporation, “Intel 6400/6402 Advanced Memory Buffer Datasheet”, pp. 38-42, December 2006, with the high speed serial links alone dissipating 4W, see H. Partovi et. al, “Data Recovery and Retiming for the Fully Buffered DIMM 4.8 Gb/s Serial Links”, ISSCC Dig. Tech. Papers, pp. 336-337, February 2006. A significant reduction of AMB power consumption, and most importantly its high speed serial links delivering a combined bit-rate of up to 115 Gb/s, remain a critical undertaking in the design of high bandwidth and high capacity memory systems.
Requirements for FBDIMMs are described in detail in “FB-DIMM High Speed Differential PTP Link at 1.5V—Specification”, JEDEC, December 2005.
In this specification, a memory architecture is described which is based on very high speed serial links joining fully buffered DIMMs (FBDs) in a daisy chain arrangement to a host as illustrated in FIG. 1.
The basic functionality of an AMB is also described in more detail in the U.S. patent application Ser. No. 11/790,707 filed Apr. 27, 2007 entitled “PROGRAMMABLE ASYNCHRONOUS FIRST-IN-FIRST-OUT (FIFO) STRUCTURE WITH MERGING CAPABILITY”, which is incorporated herein by reference.
For the convenience of the reader, FIG. 1 from this patent application is reproduced here.
FIG. 1 shows a memory system 100 of the prior art, comprising a host 102 connected to a first FBDIMM 104 over serial links 106. If the memory system contains more than one FBDIMM (as shown in FIG. 1), the first FBDIMM 104 is connected to a second FDB 108 over serial links 110. Additional FBDIMMs may be chained with serial links 112 in a daisy chain fashion, until a last FBDIMM 114 is reached. A clock buffer 116 distributes a reference clock signal to the host 102 and each of the FBDIMMs (104, 108, . . . , 114), over clock reference links 118. Each of the FBDIMMs (104, 108, . . . , 114) may include one or more memory devices (DRAMs 120) and an advanced memory buffer (AMB) 122.
Each of the serial links (106, 110, . . . , 112) comprises multiple upstream channels 124 (carrying formatted data frames towards the host 102) and downstream channels 126 (carrying formatted data frames and control information towards the last FBDIMM 114). The “channels” are also referred to as “lanes” or “bit lanes” indicating that each data frame is transmitted in multiple time slots bit-serially, and striped across the lanes of a link, a technique commonly employed in a number of high speed transmission protocols.
Writing of memory data is accomplished by transmitting the formatted frames over the downstream channels 126 of the serial links (106, 110, . . . , 112), from the host 102 through one or more AMBs 122 to the memory device (DRAM) 120 that is addressed. Reading of memory data is similarly accomplished by sending a read request from the host 102 through one or more AMBs 122 to the addressed memory device (DRAM) 120 over the downstream channels 126, and subsequently transmitting the memory data from the addressed memory device (DRAM) 120 through one or more AMBs 122 over the upstream channels 124 to the host 102.
It will be appreciated that the host 102 may communicate with a DRAM 120 on any FBDIMM, including the last FBDIMM 114, thus transmitting through a number of AMBs 122 in series. The required functions of the AMB 122 are described in the aforementioned JEDEC specification. They include                retrieving and regenerating the serial downstream channels 126 to the next AMB 122 in the daisy chain;        retrieving and regenerating the serial bit streams upstream to the previous AMB 122 in the daisy chain, or to the host 102 as required;        converting received downstream data to parallel for interfacing to the DRAMs 120 located on the same FBDIMM;        converting parallel data from the DRAMs 120 located on the same FBDIMM, to serial format for transmitting upstream; and        merging the data from the DRAMs 120 located on the same FBDIMM, with the serial data received on the upstream channels 124 from other FBDIMMs (located further downstream), for transmission on the upstream channels 124 toward the host 102.        
Given the high speed nature of the serial links, which may be running at a bit rate of 4.8 Gbit/s each, the physical constraints of signal transmission between devices, and the delays and variations within the devices themselves, one must expect skew between the bit lanes of each link and the reference clock 118. In addition jitter and wander occurs. To combat these effects the design of the AMB 122 must include high speed clock alignment circuitry (to align the data edges of each lane with the reference clock) and First-In-First-Out (FIFO) buffers to continuously absorb jitter and wander dynamically.
An approach for aligning the data edges of each lane with the reference clock 118 in the AMB 122, is to generate a separate clock for each lane, each separate clock being frequency aligned with the reference clock, but phase aligned with the data received on each respective lane. Implementations of an approach for generating phase aligned clocks, including a phase locked loop with adjustable phase shift, are described in U.S. patent application Ser. No. 11/216,952 filed on Aug. 31, 2005, Us publication number 20070047689, entitled “Phase locked loop apparatus with adjustable phase shift”, Menolfi et al. The phase locked loop (PLL) apparatus with adjustable phase shift of Menolfi includes a voltage controlled oscillator (VCO) configured to generate multiple phase shifted output signals for sampling the serial data stream, and multiple phase detectors for determining the phase difference between the VCO and a selected phase of a reference clock. The phase is selected by enabling two of the phase detectors which are connected to two phases of the reference clock that differ by 45 degrees, and summing the outputs of the phase detectors. An intermediate phase can then be selected by varying the strength of each of the two phase detectors using two digital to analog converters that supply the operating currents of the two phase detectors, as described in the cited patent application of Menolfi.
While Menolfi teaches a PLL with adjustable phase shift that could be embedded in the AMB 122 for aligning the data edges of each lane with the reference clock 118, the circuit includes features that may not be required in the AMB 122. At the same time, the circuit consumes more power considering that multiple PLLs for a large number, i.e. up to 24, of serial bit lanes are required, and it may not provide sufficient linearity in its phase control. The high power consumption of the circuit of Menolfi is due to technology constraints in providing high enough operating speed in the phase detectors over the range of currents of the digital to analog converters. Linearity of phase interpolation requires the two active digital to analog converters to be well matched at combinations of settings; linearity may also be affected by the change in current density in the phase detectors, where the current density varies over a large range and is dependent on the phase interpolator setting.
Because of these deficiencies, the PLL with adjustable phase shift according to Menolfi may not be suited for implementing a phase-adjustable PLL for use in the AMB.
Consequently a new and improved PLL with adjustable phase shift needs to be developed to overcome the disadvantages of the prior art.