The present disclosure generally relates to memory systems and, more particularly, to command delay balancing in daisy-chained memory devices
Memory devices are widely used in many electronic products and computers to store data. A memory device is a semiconductor electronic device that includes a number of memory chips, each chip storing a portion of the total data. The chips themselves contain a large number of memory cells, with each cell storing a bit of data. The memory chips may be part of a DIMM (dual in-line memory module) or a PCB (printed circuit board) containing many such memory chips. In the discussion hereinbelow, the terms “memory device”, “memory module” and “DIMM” are used synonymously. A processor or memory controller may communicate with the memory devices in the system to perform memory read/write and testing operations. FIG. 1 illustrates a prior art arrangement 10 showing signal communication between a memory controller 11 and a plurality of memory devices (DIMMs) 12, 18, and 24, over a parallel memory bus 30 (also known as a “stub bus”). For ease of discussion and illustration, only three memory devices (DIMM0 (12), DIMM1 (18), and DIMM N−1 (24)) are shown in FIG. 1 out of a total of N memory devices, which are controlled by and communicating with the memory controller 11. It is observed that also for ease of discussion each DIMM in FIG. 1 is shown to contain the same N number of DRAM (Dynamic Random Access Memory) memory chips. For example, memory module contains a DRAM memory bank 14 having an N number of DRAM chips 16, whereas memory module 18 contains a memory bank 20 having an N number of DRAM chips 22, and so on. However, it is evident that each DIMM in FIG. 1 may contain a different number of memory chips or DRAMs. It is noted here that the terms “DRAM chip,” “memory chip”, “data storage and retrieval element,” and “memory element” are used synonymously hereinbelow.
Each memory chip 16, 22, 28 may include a plurality of pins (not shown) located outside of the chip for electrically connecting the chip to other system devices through the DIMM on which the chip resides. Some of those pins (not shown) may constitute memory address pins or address bus, data pins or data bus, and control pins or control bus. Additional constructional details of a memory chip (e.g., one of the chips 16) are not relevant here and, hence, are not presented. Those of ordinary skill in the art will readily recognize that memory chips 16, 22, and 28 of FIG. 1 are not intended to be a detailed illustration of all of the features of a typical memory chip. Numerous peripheral devices or circuits (not shown) may be typically provided on a DIMM along with the corresponding memory chips for writing data to and reading data from the memory cells (not shown) in the chips. Furthermore, constructional details of a DIMM (e.g., the DIMMs 12, 18, and 24) in FIG. 1 are also not shown for ease of illustration only. In reality, each DIMM may be connected to the parallel bus 30 via appropriate DIMM connectors (not shown) to allow signal flow between the DIMM and the controller 11.
In the parallel bus implementation 10 of FIG. 1, the memory controller 11 sends address and/or control signals over the address/control bus portion (not shown) of the parallel bus 30 and transfers data to/from the DIMMs over the data bus portion (not shown) of the parallel bus 30. The parallel bus 30 is a signal transfer bus that includes address and control lines (both of which are unidirectional) as well as data lines (which are bi-directional)—some or all of which are connected to each DIMM in the system and are used to perform memory data transfer operations (i.e., data transmission and reception operations) between the memory controller 11 and respective DIMMs 12, 18, 24. The memory controller 11 may determine the modes of operation of a memory module (or DIMM). Some of the control signals (not shown) from the memory controller 11 may include a chip select (CS_N) signal, a row address select (RASN) signal, a column address select (CAS_N) signal, a Write Enable (WE_N) signal, row/column address (A), a Data Mask (DM) signal, a termination control (ODT_N) signal, and a set of single-ended or differential data strobes (RDQS/RDQS#/DQS/DQS#), etc. These control signals are transmitted on the control lines or control bus (not shown) portion of the parallel bus 30 to perform data transfer operations at selected memory cells in the appropriate memory chips (DRAMs). The “width” (i.e., number of lines) of address, data and control buses may differ from one memory configuration to another.
It is observed that in the parallel bus configuration 10 of FIG. 1, each memory module 12, 18, 24 is directly connected to the memory controller 11 via the parallel bus 30. In other words, the memory controller 11 is connected to each memory module (DIMM) in parallel. Thus, every signal output from the controller 11 reaches each memory module in parallel. While such an arrangement may be easier to implement and may provide a “wider” memory bus, a penalty to be paid is the limited speed with which signaling can be carried out on the bus 30. In modern implementations of the parallel bus 30, the signaling speed caps at about 800 MHz. Further, in the parallel bus configuration, any delay encountered in the slowest DIMM governs the overall delay in data transfer operations. To increase the signaling speed of memory data transfer operations in the GHz region to avail of the processing power of modern faster memory chips and controllers, the parallel bus configuration may not be suitable.
FIG. 2 illustrates an alternative configuration 32 where memory modules (DIMMs) 34, 40, and 44 are connected to a memory controller 33 in a daisy-chained configuration. As before, only three of the memory modules (out of a total of N modules) are illustrated in FIG. 2 for the sake of simplicity. Also for the sake of clarity, in FIG. 2, a connector for a memory module (the DIMM connector) is identified with the same reference numeral as that of the corresponding memory module. Similar to the embodiment of FIG. 2, each DIMM in FIG. 2 contains a corresponding DRAM memory bank with a plurality of memory chips or DRAM chips therein. For example, DIMM 0 (34) is shown to contain a memory bank36 with N DRAM chips 38. For the sake of clarity, other memory banks (e.g., memory banks 42 and 46) in FIG. 2 are not shown with corresponding memory chips.
In the daisy-chained configuration 32 of FIG. 2, each DIMM connector 34, 40, 44 has a pair of “downlink” terminals and a pair of “uplink” terminals. Each pair of downlink terminals includes a downlink-in terminal (DL_In) and a downlink-out terminal (DL_Out). Similarly, each pair of uplink terminals includes an uplink-in terminal (UL_In) and an uplink-out terminal (UL_Out). The daisy-chained configuration is a serial signal transfer mechanism as opposed to the parallel mechanism shown in FIG. 1. Thus, a memory module receives a signal from the memory controller 33 on the downlink channel (comprising of all the downlink terminals 48A-48C in the configuration 32), whereas a signal to the memory controller 33 is transmitted on the uplink channel (which includes all the uplink terminals 50A-50C in the configuration 32). Signals are serially propagated from one memory module to another via signal “hops.” Thus, for example, a command broadcast to all of the DIMMs 34, 40, 44 from the memory controller 33 is first received at the DL_In terminal 48A of DIMM 0 (34), which, in turn, forwards that command to DIMM 1 (40) via its DL_Out terminal 48B that is also connected to the DL_In terminal of DIMM 40. This completes one command “hop”. After a second command “hop”, the command from the memory controller 33 appears at DL_Out terminal 48C of the memory module 40. Thus, with a total of N−1 “hops”, the command will reach the last or farthest DIMM (here, DIMM 44) in the memory channel (which consists of all memory modules connected to the memory controller 33 in the daisy-chained configuration 32). Similarly, an N−1 “hops” may be needed for a response to the command from the last or farthest DIMM 44 to reach the memory controller 33. It is noted here that the term “command” is used herein to refer to address, data, and/or control signals transmitted from the memory controller (e.g., during a data write operation, or during a memory module testing operation) to one or more DIMMs in the system 32. On the other hand, the term “response” is used herein to refer to a data or a status signal (e.g., during a data read operation, or during a memory test operation) sent to the memory controller 33 and generated by a DIMM in response to the command received from the memory controller 33.
As is seen from FIG. 2, in a daisy-chained memory configuration, the memory controller 33 is directly connected to only one of the DIMM modules (i.e., the memory module 34 in FIG. 2) as opposed to all of the memory modules as in the parallel bus configuration of FIG. 1. Thus, one disadvantage of the serial daisy-chaining is that a defect or malfunction at one of the memory modules may prevent further “downstream” propagation of the command from the memory controller 33. However, despite this disadvantage, the daisy-chained configuration 32 offers significant benefits including, for example, very high speed signal propagation (in the range of multi-GHz) and more control over individual DIMM's data transfer operations. Thus, the signaling in the daisy-chained configuration 32 can be significantly faster than that in the parallel configuration 10. As noted before, each DIMM in the daisy-chained configuration acts as a “repeater” of the signal for the next DIMM-downstream (connected to the DL_Out terminal) or upstream (connected to the UL_Out terminal). The downlink and uplink channels are extremely fast, narrow-width, unidirectional (one-way) signal buses that carry encoded signal packets (containing memory address, data, and/or control information from the memory controller 33) which are decoded by the receiver DIMM. The downlink channel carries signal in one direction, whereas the uplink channel carries a different signal in the opposite direction. It is evident that in the daisy-chained configuration 32 of FIG. 2, a signal must travel through “hops” whether it is a signal broadcast from the memory controller 33 to all of the DIMMs in the memory channel, or whether it is a signal addressed to only a single DIMM in the memory channel. That is, any signal from the memory controller 33 propagates to the desired/destination DIMM(s) via one or more hops involving one or more intervening DIMMs.
It is noted here that the term “daisy-chained configuration” is used herein to refer to a high-speed, serial bus configuration and, more particularly, to a serial bus configuration linking a plurality of electronic devices (e.g., memory modules 34, 40, 44 in FIG. 2) with a controller thereof (e.g., the memory controller 33 in FIG. 2) using unidirectional signal transfer links, where the set of links or terminals (the downlinks) carrying signals out of the controller is different from the set of links (the uplinks) that carries the signals to the controller.
From the foregoing discussion, it is seen that in the daisy-chained configuration 32 of FIG. 2, a signal encounters varying amounts of delay before reaching a destination DIMM or the memory controller 33. For example, the DIMM 44 may receive a signal transmitted from the memory controller 33 after a specific delay has elapsed, wherein the delay would include the time consumed by N−1 hops needed before the signal can reach DIMM 44. On the other hand, in case of DIMM 40, the signal may get delayed only by the time taken to conclude a single hop (through DIMM 34) to reach DIMM 40. In the event of a response generated by a DIMM, the delay for the response to reach the memory controller 33 also varies depending on the “depth” of the memory channel. For example, a response generated by DIMM 0 (34) may reach the memory controller without any “hops”, whereas a response from the DIMM 44 may need to go through N−1 hops before reaching the memory controller 33. Thus, the amount of delay may linearly vary with the physical proximity of a memory module 34, 40, 44 to the memory controller 33 (i.e., the farther the memory module, the higher the delay), and may also linearly vary with the total number of memory modules in the memory channel (i.e., the more the number of memory modules serially connected to the controller 33 in the daisy-chained manner, the higher the delay for the farther modules).
It is seen from the above discussion that in the daisy-chained configuration 32 of FIG. 2, a command from the memory controller 33 may be processed by different DIMMs at different times because of the inherent command propagation delay through “hops.” Similarly, responses from different DIMMs may arrive at different times at the controller 33, again because of the delays through “hops.” In the embodiment of FIG. 2, the command delay or command propagation delay (i.e., the total delay for a command or signal from the memory controller 33 to reach the farthest DIMM 44) must be accounted for along with the response delay or response propagation delay (i.e., the total delay for a response from the farthest DIMM 44 to reach the memory controller 33) so as to assure that a response from any DIMM in the system 32 reaches the memory controller 33 at the same time. This effect may be called “delay levelization”, i.e., the memory controller 33 need not wait for varying amounts of time to receive responses from various DIMMs in the system 32. Instead, a fixed, predetermined time delay is all that is required for the memory controller 33 to wait for in expecting a reply from any DIMM in the system 32. Thus, from the memory controller's perspective, only a fixed, single delay exists between sending a command and receive a response, irrespective of the depth of the memory channel or the physical proximity of a DIMM to the memory controller 33. This aspect is similar in principle to the latency in the parallel bus configuration of FIG. 1. As noted before, in case of FIG. 1, the delay of the slowest DIMM may govern the latency experienced by the controller 11 between a command and the receipt of its response from a DIMM in the system 10. In case of the daisy-chained configuration 32 of FIG. 2, it is similarly desirable that the controller 33 be freed from making latency determinations on a case-by-case basis for each DIMM. Instead, the delay may be “levelized” so that the controller 33 may receive (or “expect”) a response from any DIMM 34, 40, 44 at the same time.
FIG. 3 illustrates a prior art methodology to achieve delay levelization in the daisy-chained memory channel of FIG. 2. In FIG. 3, constructional details to achieve delay levelization are illustrated for only one of the DIMMs (i.e., DIMM 1 (40)) in the system 32 in FIG. 2. However, it is evident that a similar configuration may be present on each DIMM 34, 40, 44 in the system 32. The DIMM 40 in FIG. 3 is shown to include a DIMM-specific response delay unit 52, which allows a programmable delay to be stored therein. The amount of delay to be programmed in the delay unit 52 may primarily depend on three factors: (1) the physical proximity of the DIMM 40 to the memory controller 33, (2) the total number of DIMMs in the daisy-chained configuration 32, and (3) the total of the command propagation delay to the farthest DIMM in the system (e.g., the DIMM 44 in FIG. 2) and the response propagation delay from the farthest DIMM to the memory controller 33. For example, for simplicity and illustration, assume that there are only three DIMMs (DIMMs 34, 40, and 44) in the system 32 of FIG. 2 and there is one clock cycle of “hop-related” delay for each of the command and response propagations at each DIMM in the system 32 (except the farthest DIMM 44, as discussed below). That is, it is assumed that it takes one clock cycle of delay to propagate a command signal to the next downstream DIMM over the downlink channel, and it also takes one clock cycle of delay to propagate a response signal to the next upstream DIMM over the uplink channel—i.e., a symmetrical delay in uplink and downlink channels. In that case, ignoring very small signal processing delays (to process a command and to generate a response) by the DRAM memory bank 42, the delay unit 52 in FIG. 3 may be programmed to appropriately delay transmission of the response (which may contain the data to be read) generated by the memory chips in the memory bank 42 to the command from the memory controller 33.
In the present example, the amount of delay to be programmed in the delay unit 52 equals [T*(N−1)/P] clock cycles, where “T” is the total “hop-related” clock cycle delay at a DIMM (except the farthest DIMM 44, as discussed below) including the delays to propagate a command to the next “downstream” DIMM and a response to the next “upstream” DIMM in the daisy chain (T=2 in the present example), “N” is the total number of DIMMs in the system (here, N=3), and “P” is the physical proximity of the DIMM to the memory controller 33 (e.g., P=1 for the first or closest DIMM 34, P=2 for the second downstream DIMM 40, and so on). Therefore, in the case of DIMM 1 (40), the value of delay to be programmed in the unit 52 is equal to 2 clock cycles, whereas the value of delay to be stored in the corresponding delay unit (not shown) in the DIMM 0 (34) is 4 clock cycles. In case of the farthest DIMM (i.e., the DIMM 44 in FIG. 2), the value of programmable delay may be zero because T=0 for the farthest DIMM.
It is seen from the foregoing that the levelization discussed with reference to FIGS. 2 and 3 allows the memory controller 33 to receive a response from any memory module in the daisy-chained configuration 32 at the same time. With the use of appropriate delays at each DIMM in the system 32 to compensate for the time consumed in propagation of command and response signals to/from the farthest DIMM in the daisy chain, the memory controller 33 receives a response from each DIMM at the same time, regardless of the physical proximity of the DIMM with respect to the controller 33. That is, the controller 33 “expects” and receives the response after a fixed delay has elapsed from the transmission of the command by the controller 33 over the downlink channel, regardless of whether the command is sent to a single DIMM or broadcast to all DIMMs in the system. For example, if a command is sent at time “t”, then in the case of the previous example, the memory controller 33 receives a response 4 clock cycles after “t”, regardless of whether the command is sent to DIMM0(34) or to DIMMN−1(44).
It is observed with reference to the embodiment of FIG. 3 that the dotted lines are shown in FIG. 3 to illustrate how a signal propagates within the DIMM 40. Thus, for example, a command signal appearing at DL_In terminal 48B would directly propagate to the DL_Out terminal 48C to be sent to the next downstream DIMM. That command signal would also be sent to the DRAM memory bank 42 for processing (e.g., data writing to memory cells). On the other hand, a response signal appearing at the UL_In terminal 50C from an adjacent (“upstream”) DIMM would similarly be propagated directly to the UL_Out terminal 50B. The DIMM 40 may add its own response (appropriately delayed through the delay unit 52 as discussed hereinbefore) with the signal received at the UL_In terminal 50C so as to also send its response along with the previous DIMM's response to the next DIMM in the uplink channel.
Despite streamlining or “normalizing” the delivery of responses from DIMMs to the memory controller 33, the embodiment of FIG. 3 still leaves the memory controller 33 unable to predict when a command will be executed by a specific DIMM. It may be desirable, especially in some DRAM operations, for the memory controller 33 to predict the execution of the commands by addressee DIMMs so that the controller 33 can control the memory system power consumption (or power profile) with better certainty and/or more easily. For example, some DRAM operations, such as a “Refresh” command, may consume a lot of power. In the embodiment of FIG. 3, the memory controller 33 may spread out the DIMM-specific refresh commands over time to try to reduce drawing too much system power, i.e., to try to avoid sudden surges in power consumption when two or more DIMMs simultaneously execute their corresponding refresh commands. Thus, in the case of only three DIMMs (e.g., DIMMs 34, 40, 44), the memory controller 33 may send a refresh command to the farthest DIMM 44 on the first clock cycle, then a second refresh command to the middle DIMM 40 on the second clock cycle, and a third refresh command to the closest DIMM 34 on the third clock cycle. However, despite such spreading out of refresh commands, it may happen that DIMMs 40 and 44 end up executing the refresh command at the same time, which may not be preferable. Or, even if such simultaneous processing of the refresh command is tolerated, it may still be desirable for the memory controller to “know” when the commands will be processed by recipient DIMMs.
Therefore, it is desirable to devise a system wherein, in addition to the prediction of the timing of receipt of a response from a DIMM, the memory controller can effectively predict when a command sent by it will be executed by the addressee DIMM. With such ability to predict command execution timing, the memory controller can efficiently control power profile of all the DRAM devices (or memory modules) on a daisy-chained memory channel.