This invention relates generally to computer memory, and more particularly to providing read clock sharing between memory devices.
Contemporary high performance computing main memory systems are generally composed of memory devices, which are connected to one or more processors via one or more memory control elements. These memory devices are generally located on a memory card module and connected through a module connector to a mother board.
FIG. 1 depicts a contemporary system composed of an integrated processor chip 100, which contains one or more processor elements and an integrated memory controller 110. In the configuration depicted in FIG. 1, multiple independent cascade interconnected memory interface busses 106 are logically aggregated together to operate in unison to support a single independent access request at a higher bandwidth with data and error detection/correction information distributed or “striped” across the parallel busses and associated devices. The memory controller 110 attaches to four narrow/high speed point-to-point memory busses 106, with each bus 106 connecting one of the several unique memory controller interface channels to a cascade interconnect memory subsystem 103 (or memory module, e.g., a dual in-line memory module or “DIMM”) which includes at least a hub device 104 and one or more memory devices 109. In the system depicted in FIG. 1, there are “n” ranks. Typically those “n” ranks share the common narrow/high speed busses and are not accessed simultaneously. Thus, data signal pins of those ranks are directly connected to common signal lines in a time-multiplexing manner. However, read clock signals cannot be directly connected in the same manner because they are driven by individual chips (e.g. memory devices) all the time, whereas data signals are driven by a certain chip for a short period of time when the chip is being accessed. Some systems further enable operations when a subset of the memory busses 106 are populated with memory subsystems 103. In this case, the one or more populated memory busses 108 may operate in unison to support a single access request.
FIG. 2 depicts a memory structure with cascaded memory modules 103 and unidirectional busses 106. One of the functions provided by the hub devices 104 in the memory modules 103 in the cascade structure is a re-drive function to send signals on the unidirectional busses 106 to other memory modules 103 or to the memory controller 110. FIG. 2 includes the memory controller 110 and four memory modules 103, on each of two memory busses 106 (a downstream memory bus with 24 wires and an upstream memory bus with 25 wires), connected to the memory controller 110 in either a direct or cascaded manner. The memory module 103 next to the memory controller 110 is connected to the memory controller 110 in a direct manner. The other memory modules 103 are connected to the memory controller 110 in a cascaded manner. Each memory module 103 may include one or more ranks of memory devices 109. Although not shown in this figure, the memory controller 110 may be integrated in the processor 100 and may connect to more than one memory bus 106 as depicted in FIG. 1.
Current dynamic random access memory (DRAM) devices generally utilize a strobe to perform data reads. For future DRAM devices such as double data rate four (DDR4), a (free-running) read clock is preferred instead of a strobe because inter-symbol-interference within a strobe due to a preamble reduces a timing margin and the amount of the reduction is not negligible as the data rate increases (e.g., beyond three gigabytes per second).
However, a read clock cannot share the same signal lane and connector pin because a device should always transmit its signal toward a memory controller. This is because a read clock is always driven by all the chips (e.g. memory devices) in different ranks. Therefore, using a read clock instead of a strobe might increase the total connector pin count, in particular for multi-rank memory modules. Typically, a read clock is provided per four or eight data signals. In single rank memory modules the number of total read clock pins is one quarter or one eighth of the number of total data pins. Because a read clock signal cannot be shared between different ranks, the total number of read clock pins in a memory module will be increased proportionally to the number of ranks. For example, assume that a single-rank memory module has eighteen memory devices and each memory device has four data signals. In this case, the total number of data pins required for the module is 18×4=72, and the total number of read clock pins required for a module is 18×4/4=18 (same as the number of chips, because each chip will have one read clock). For a dual-rank device module, the number of data pins is 72 because data signals will share module pins between 2 ranks. But the number of read clocks is 18×2=36 because they cannot be shared between ranks, so each read clock signal in each memory device is pulled out to the module pin. Thus, a multi rank memory module will have to have more read clock pins as the number of ranks increase, whereas a bus or a connector pin for a strobe can be shared between two or four memory devices in different ranks.
It would be highly desirable to be able to utilize a read clock without increasing a connector pin count for multi-rank memory modules.