In a computer system, the central processor (CPU) accesses program information and data that is stored in a memory system. There is a hierarchy of memory systems, in size, in speed and capacity that a computer systems architect selects during the design phase, which may comprise, for example, cache memory, main memory and secondary memory. Cache memory is typified by low latency, high bandwidth and high cost per bit, and may be integral to the CPU. Cache memory may be a semiconductor device and which may be, for example, SRAM (static random access memory). Main memory, which is also a semiconductor technology, and which is typically a form of DRAM (dynamic random access memory), is used for less frequently accessed data and program data. At present, personal computers may have up to about 4 GB of DRAM, while high end servers may have about 16 GB or more of DRAM. Strategies such as using a plurality of memory controllers and computer cores may provide access to larger amounts of such memory; however, many of the computer bus systems have practical upper limits due to propagation time, bus loading, power consumption and the like. Larger amounts of data may be stored on mass storage; for example, magnetic disks, where a single disk may contain a terabyte (TB) of memory, FLASH memory disks (sometimes called solid state drives—SSD), and clusters of disks may be used. The access time for data stored on magnetic disks is significantly longer than that for data stored in main memory.
Large amounts of DRAM or other memory such as FLASH may be provided in memory appliances such as that described in U.S. patent application Ser. No. 11/405,083, filed on Apr. 17, 2006. To the extent that large memory arrays have a latency approaching that of conventional main memory, such memory arrays may be considered as similar to main memory and provide rapid access to large amounts of data that would have otherwise been stored in mass storage, such as rotating disk media.
Data is moved between memories and other devices and the central processor on pathways known as buses, which may of various architectures, including parallel, serial, point-to-point, daisy chained, or multi-stub, as examples.
A data bus may be operated in a synchronous manner if the clock frequency for the transmission and reception of data is the same at all points in the system, and a known phase relationship exists between the data bits at each point where the data is to be sensed (e.g., received). However, considering the transmission of data in a parallel bus between two adjacent nodes, the phase relationship of the data bits in different lines changes, depending on the time-delay skew. In slow speed data transmission, and for short bus lengths, this may be tolerated, but in high speed data transmission, the data bits may be received in varying phase relationships to the system clock, and may be delayed by more than one clock interval, resulting in errors, or requiring de-skewing and phase alignment, typically at each memory node.
This problem may be mitigated by transmitting the clock and data on each of the lines, and recovering the clock for each channel at a node. This clock differs in time delay with respect to the system clock from line-to-line. Also, when the clock is transmitted along with the data, transmission of data, or at least an idle data pattern, may be required so as to maintain the synchronization of the clock for each line.
Alternatively, the data may be recovered at each node by accumulating the data for each line in a buffer, determining the time delay adjustment needed to compensate for the data skew, and reconstructing the data received at each node prior to acting on the data (where the word data is understood to include in-band commands such as read, write, and the like as well as information, which may include instructions, that is to be written to, or read from memory.) In order to make the skew-compensation adjustment, the amount of data that must be buffered may be up to the number of clock cycles of skew that may accumulate along the bus. The overall delay in transmitting data from one end of a bus to the other is the sum of the maximum skew of each of the individual node-to-node skew values along the data path. An example of this type of bus is the FB-DIMM (fully-buffered DIMM) which is the subject of a JEDEC standard.
A bus may also be operated as a multi-drop bus where the data is transmitted from a sending end (such as a memory controller) and received at a target memory module: for example the 3rd memory module along a linear bus. The module may be a dual-in-line memory module (DIMM), as is known in the art, and the maximum total skew may be equal to that of the specific bus line having the longest transmission delay. The transmission delays result from differences in trace lengths for the individual data lines, and the differences in trace lengths may include the traces on a mother board as well as the traces on the circuit card containing the memory module. The total skew may limit the length of the bus or the signaling speed.
As described in U.S. application Ser. No. 11/405,083, the effect of skew between the lines of a bus operating, for example, in a serial point-to-point manner, or in a branching manner, may be mitigated by suitably exchanging the logical assignment of the data lanes to specific bus lines, depending on the amount of skew experienced in each bus line between communicating nodes, such that, at the node at the intended receiving end of the bus, the skew is minimized, known, or controlled. As such, the amount of correction needed for skew compensation at the receiving node where the data is to be acted upon may be reduced, with a concomitant reduction in device complexity and power consumption.