Most electronic memory systems permit two or more memory modules to be connected to each memory port of a memory controller. This feature allows a memory system manufacturer to connect one memory module to each memory controller port, while still allowing a memory system owner to later upgrade the memory system by adding at least one additional memory module to each memory controller port.
However, in memory systems having a high rate of data signaling or other restrictive signaling requirements, only a single memory module is permitted to be connected to each memory controller port. This is sometimes called a point-to-point connection topology, or a port-per-module memory system. When a memory system is constrained in this fashion, and when it is still necessary to allow the memory system to be upgraded at least once after its initial manufacture, then problems can arise when the memory capacities of the initial memory module and the additional memory module(s) do not match.
To illustrate these problems, it is useful to first describe memory systems having point-to-point memory module connections wherein only a single memory module is employed or multiple memory modules of matching memory capacity are employed. For example, FIG. 1 illustrates two such memory systems having point-to-point memory module connections. More particularly, FIG. 1A shows a memory system 10 with one memory module 12 connected to a first port 14 of a memory controller 16. FIG. 1B shows a memory system 20 with the first memory module 12 connected to the first port 14 of the memory controller 16, and a second memory module 28 connected to a second port 30 of the memory controller 16.
The memory modules 12 and 28 in FIGS. 1A and 1B are divided into ranks (rows) of memory components (MEM) 32. The number of ranks is denoted NR, and may vary from module to module. Note that in some memory systems the point-to-point connection constraint may extend to the memory component as well as the memory module. In such a case, the number of ranks NR is limited to one.
The memory modules 12 and 28 in FIGS. 1A and 1B are also divided into slices (columns) of memory components (MEM) 32. The number of slices is denoted NS, and may also vary from module to module. However, the number of slices NS times the number of data-type signals per slice Ndq is a constant (NDQ=NS*Ndq), determined by the number of data-type signals at a memory controller port NDQ.
The notion of “slice” is used to distinguish address-type signals “A” from data-type signals “QD”. The data-type signals (QD) from a slice of a memory controller port are only connected to a corresponding slice of each rank of memory components in a memory module. The address-type signals (A) are connected to all slices of each rank of memory components in a memory module. The address-type signals (A) can usually fan-out to more memory components than can data-type signals for several reasons including: [1] the signaling rate of address-type signals (A) is typically lower than data-type signals, and [2] address-type signals (A) are typically unidirectional (flowing from memory controller to memory components) and data-type signals (QD) are typically bi-directional (flowing in one direction at one time and flowing in the opposite direction at another time).
In addition to memory components (MEM) 32, each memory module 12 and 28 also contains some form of termination structure (T) 22 at the end of each signal wire. This is typically some sort of resistor component, and is typically required due to high signaling rates in a memory system.
Other connection topologies within a memory module are also possible, and will be described in detail below. The topologies shown in FIG. 1 are representative of these other connection topologies, and are used as an example to illustrate the problem arising from the need to upgrade memory systems with point-to-point connections between memory controller and memory module(s).
FIG. 2 shows the internal detail of the memory component (MEM) 32 that is used in the memory modules of FIG. 1. The address-type signals (A) typically comprise row signals (ARCLK/ARSTROBE, AREN, OPR, ABR, and AR) and column signals (ACCLK/ACSTROBE, ACEN, OPC, ABC, and AC). The data-type signals (QD) typically comprise read signals (QEN, QCLK, QSTROBE, and Q) and write signals (DEN, DCLK, DSTROBE, D and DM). Both the address-type signals (A) and the data-type signals (QD) are used to control access to 2NB banks of memory core 34.
ARCLK/ARSTROBE is a timing signal which is used to indicate when other row signals carry valid information. Such a timing signal is usually called a “clock” or “strobe” signal. AREN is a control signal which is optionally present. It is an “enable” signal that can indicate when the valid information carried by other row signals is to be used or ignored by the memory component (MEM) 32. OPR is a set of signals (a set of Nopr wires) that is used to indicate what type of row operation is to take place. ABR is a set of signals (a set of Nb wires) that is used to indicate the bank address for a row operation. AR is a set of signals (a set of Nr wires) that is used to indicate the row address for a row operation. Three row decode blocks 36 are provided which include storage elements (registers and/or latches) and logic that are needed to provide row control signals to the memory core 34 at the appropriate time.
ACCLK/ACSTROBE is timing signal which is used to indicate when other column signals carry valid information. Such a timing signal is usually called a “clock” or “strobe” signal. ACEN is a control signal which is optionally present. It is an “enable” signal that can indicate when the valid information carried by other column signals is to be used or ignored by the memory component. OPC is a set of signals (a set of Nopc wires) that is used to indicate what type of column operation is to take place. ABC is a set of signals (a set of Nb wires) that is used to indicate the bank address for a column operation. AC is a set of signals (a set of Nc wires) that is used to indicate the column address for a column operation. Three column decode blocks 38 are provided which include storage elements (registers and/or latches) and logic that are needed to provide column control signals to the memory core 34 at the appropriate time.
Note that in some memory components, some of the above sets of signals could share the same wires. However, these signals are shown in FIG. 2 in unshared form for purposes of descriptive clarity.
There are two principle types of row operation: activate and precharge. When an activate operation is indicated, one of the 2Nr rows of the 2Nb banks of the memory core 34 is selected by row drivers 40 of the memory core 34. (2Nc*M*Ndq) bits of the selected row are then sensed and latched by column sense amplifiers 42 of the memory core 34. When a precharge operation is indicated, the column sense amplifiers 42, row drivers 40, and other circuitry of the memory core 34 are returned to a precharged state to await the next activate operation.
There are two principle types of column operation: read and write. When a read operation is indicated, one of the 2Nc columns of the 2Nb banks of the memory core 34 is selected, and (M*Ndq) bits of the selected column are transferred to a multiplexer 44. This data is grouped into “M” sets of “Ndq” bits. The multiplexer 44, which performs a parallel-to-serial conversion on the data, transfers “Ndq” bits at a time (repeated “M” separate times) to “Ndq” data output pins (Q). QCLK and QSTROBE are timing signals which are asserted and generated, respectively, to indicate when the data output pins (Q) carry valid information. Note that QCLK is typically supplied by an external source, but could be generated inside the memory component (MEM) 32. It could also be synthesized internally from one of the other timing signals. QSTROBE is typically generated inside the memory component (MEM) 32 in response to QCLK. A memory component might have both QCLK and QSTROBE present, or it might have only one of the signals present, or it might have neither present. In the last case, a timing signal for output data is typically synthesized from other timing signals present in the memory component (MEM) 32.
QEN is a control signal which is optionally present. It is an “enable” signal that can indicate whether the memory component (MEM) 32 is to drive output data onto the data output pins (Q).
When a write operation is indicated, one of the 2Nc columns of the 2Nb banks of the memory core 34 is selected, and (M*Ndq) bits are received at a first demultiplexer 46. This data is grouped into “M” sets of “Ndq” bits. The first demultiplexer 46, which performs a serial-to-parallel conversion on the data, receives “Ndq” bits at a time (repeated “M” separate times) from “Ndq” data input pins (D). DCLK and DSTROBE are timing signals which are asserted to indicate when the data input pins (D) carry valid information. Note that these timing signals are typically supplied from an external source. They could also be synthesized internally from one of the other timing signals. A memory component might have both QCLK and QSTROBE present, or it might have only one of the signals present, or it might have neither present. In the last case, a timing signal for input data is typically synthesized from other timing signals present in the memory component (MEM 32).
DEN is a control signal which is optionally present. It is an “enable” signal that can indicate whether the memory component (MEM) 32 is to receive input data from the data input pins (D).
The “DM” pins carry “Ndm” signals which supply mask information for the write operation. These signals are treated like the write data signals from a timing perspective, passing though a second demultiplexer 48 and undergoing a serial-to-parallel conversion. The (M*Ndm) mask signals are passed to the memory core 34 along with the (M*Ndq) data signals and control which of the data bits are written to the selected (M*Ndq) storage cells of the column sense amplifier and eventually to the corresponding storage cells of the selected row of the selected bank.
Note that the signals carried on input (D) and output (Q) pins are usually carried on the same wires (i.e., the QD data lines shown). However, they are shown separately in FIG. 2 for purposes of descriptive clarity. Also note that some of the other timing and control signals could also share the same wires. Again, however, these signals are shown in FIG. 2 in unshared form for purposes of descriptive clarity. In any event, as previously indicated, all the signals associated with the input (D) and output (Q) pins share the same topology (i.e., connecting from a slice of a memory controller port to corresponding slices of ranks of memory components in a memory module).
With the basic point-to-point connection topology memory systems of FIG. 1 now having been fully described, it is now appropriate to describe the problems which can arise when port-per-module memory systems having point-to-point memory module connections employ memory modules of differing memory capacity. To describe these problems, it is useful to describe several alternative port-per-module memory systems having point-to-point memory module connections wherein multiple memory modules of differing memory capacity are employed.
FIG. 3 illustrates the simplest alternative port-per-module memory system called an exclusive port-per-module memory system. In this alternative, a memory request is directed to either of two memory controller ports. There is no attempt to operate the two memory controller ports simultaneously. This alternative has the advantage that the performance of the memory system does not depend upon the relative sizes and presence of the memory modules. The disadvantage of this alternative is that the memory system has underutilized resources (the memory controller ports and memory modules) relative to a memory system which is able to operate memory modules simultaneously.
In FIG. 3, there are five cases shown: a first memory module 62 of capacity “1x” only in FIG. 3A; the first memory module 62 and a second memory module 64 with memory capacities of “1x”/“1x”, respectively, in FIG. 3B; the first memory module 62 and a second memory module 66 with memory capacities of “1x”/“2x”, respectively, in FIG. 3C; the first memory module 62 and a second memory module 68 with memory capacities of “1x”/“4x”, respectively, in FIG. 3D; and the first memory module 62 and a second memory module 70 with memory capacities of “1x”/“8x”, respectively, in FIG. 3E. In each case, a memory controller 50 comprises a read multiplexer 52 that selects read data from one of two memory controller ports 58 and 60, and two drivers 54 and 56 that transmit write data to one of two memory controller ports 58 and 60. Also, in each case, the unified memory space presented by the memory controller 50 to the rest of the system consists of the larger memory space in the lower addresses, and the smaller memory space in the upper addresses. In the case of FIG. 3E with the “1x”/“8x” memory modules, the “8x” memory module 70 occupies the low 2NA+3 words. The “1x” memory module 62 occupies the high 2NA words.
Each addressable word is ND bits in size, where ND=M*NDQ, and NDQ=NS*Ndq. NDQ is the number of QD signal wires per rank, and ND is the number of bits transferred serially in “M” successive time intervals on the NDQ wires. NS is the number of slices (memory components) per rank, and Ndq is the number of QD signal wires per slice (memory component).
FIG. 4 illustrates a second alternative port-per-module memory system called an independent port-per-module memory system. In this system, there are two sets of address, read data, and write data signals between a memory controller 74 and the rest of the system (not shown). These two sets of signals are appended with a “u” or “v” to distinguish them. They are connected to memory request sources in the system (e.g., central processing unit, graphics unit, I/O unit, etc), and they permit two simultaneous memory requests to be performed.
In FIG. 4, there are five cases shown: a first memory module 86 of capacity “1x” only in FIG. 4A; the first memory module 86 and a second memory module 88 with memory capacities of “1x”/“1x”, respectively, in FIG. 4B; the first memory module 86 and a second memory module 90 with memory capacities of “1x”/“2x”, respectively, in FIG. 4C; the first memory module 86 and a second memory module 92 with memory capacities of “1x”/“4x”, respectively, in FIG. 4D; and the first memory module 86 and a second memory module 94 with memory capacities of “1x”/“8x”, respectively, in FIG. 4E. In each case, the memory controller 74 comprises two address multiplexers 76u and 76v, two read data multiplexers 78u and 78v, and two write data multiplexers 80u and 80v. The address multiplexers 76u and 76v have address queues 82u and 82v, respectively, and the write data multiplexers 80u and 80v write data queues 84u and 84v, respectively, for accumulating memory request addresses and write data, as described in detail below.
This second alternative port-per-module memory system is called an independent port-per-module memory system because two memory module spaces are accessed independently. Typically, a high order address bit of the Au and Av address buses is used to select between the first and the second memory modules. Each memory request on the “u” and “v” buses is steered to the queue for the appropriate memory module.
In the case of FIG. 4E with the “1x”/“8x” memory modules, the second memory module 94 will typically receive eight times as many memory requests (per unit of time) as the first memory module 86 if the requests are evenly distributed across the memory spaces. This is the reason for the queues, since they permit memory requests to the more-dense memory module to be accumulated until each less frequent memory requests for the less-dense memory module is received. This insures that the memory system achieves the best possible performance level, but doesn't fix the fundamental problem of an uneven request rate to the two mismatched memory modules.
Some applications may be able to guarantee that the numbers of memory requests per unit of time to each memory module are reasonably balanced. This may be possible by placing more frequently accessed code and data structures in the less-dense memory module. If this is not possible, then the performance of a system with two mismatched memory modules (e.g., 1x/8x) might have lower performance than a system with two matched modules (e.g., 1x/1x) even though there is more memory in the mismatched system. This is very undesirable, since it is expected that if the amount of memory is increased in a system, the performance will increase.
FIG. 5 illustrates a third alternative port-per-module memory system called a lockstep port-per-module memory system. As in the second alternative memory system of FIG. 4, in the third alternative memory system of FIG. 5 there are two sets of read data and write data signals between a memory controller 96 and the rest of the system (not shown). These two sets of signals are appended with a “u” or “v” to distinguish them. They are connected to memory request sources in the system (e.g., central processing unit, graphics unit, I/O unit, etc), and they permit two simultaneous memory requests to be performed.
However, unlike the second alternative memory system of FIG. 4, in the third alternative memory system of FIG. 5 there is only a single address bus “A” between the memory controller 96 and the rest of the system (not shown).
In FIG. 5, there are five cases shown: a first memory module 114 of capacity “1x” only in FIG. 5A; the first memory module 114 and a second memory module 116 with memory capacities of “1x”/“1x”, respectively, in FIG. 5B; the first memory module 114 and a second memory module 118 with memory capacities of “1x”/“2x”, respectively, in FIG. 5C; the first memory module 114 and a second memory module 120 with memory capacities of “1x”/“4x”, respectively, in FIG. 5D; and the first memory module 114 and a second memory module 122 with memory capacities of “1x”/“8x”, respectively, in FIG. 5E. In each case, the memory controller 96 comprises address decode logic 98, a read data buffer 100, a read data multiplexer 102, a read data driver 104, a write data buffer 106, a write data multiplexer 108, and two write data drivers 110 and 112.
This third alternative memory system of FIG. 5 is called a lockstep port-per-module memory system because each memory request is made to two memory modules in lockstep (i.e., simultaneously). The Ru read data and Wu write data is steered from/to the QD1 data bus of a first memory controller port 124, and the Rv read data and Wv write data is steered from/to the QD2 data bus of a second memory controller port 126. This permits memory requests to be completed at the maximum possible rate as long as there are equal amounts of memory in each memory module. However, if the memory modules are mismatched, the performance will drop. This can be best seen in the case of FIG. 5E with the “1x/8x” memory modules 114 and 122, respectively. When the memory space above the 2NA address is accessed, memory locations will only be available in the second memory module 122. For a read operation, it will be necessary to access two memory locations sequentially in the second memory module 122 and steer them to the Ru and Rv buses. For a write operation, it will be necessary to steer the Wu and Wv buses to the second memory module 122 for two sequential accesses. As a result, the upper memory space can only be accessed at half the rate of the lower memory space. As in the second alternative memory system of FIG. 4, in the third alternative memory system of FIG. 5 it is possible that adding memory to the system may cause its performance to be lowered.
In view of the foregoing, it would be desirable to provide at least one technique for increasing bandwidth in port-per-module memory systems having mismatched memory modules which overcomes the above-described inadequacies and shortcomings in an efficient and cost effective manner.