The present invention relates generally to memory systems and, more particularly, to techniques for increasing bandwidth in port-per-module memory systems having mismatched memory modules.
Most electronic memory systems permit two or more memory modules to be connected to each memory port of a memory controller. This feature allows a memory system manufacturer to connect one memory module to each memory controller port, while still allowing a memory system owner to later upgrade the memory system by adding at least one additional memory module to each memory controller port.
However, in memory systems having a high rate of data signaling or other restrictive signaling requirements, only a single memory module is permitted to be connected to each memory controller port. This is sometimes called a point-to-point connection topology, or a port-per-module memory system. When a memory system is constrained in this fashion, and when it is still necessary to allow the memory system to be upgraded at least once after its initial manufacture, then problems can arise when the memory capacities of the initial memory module and the additional memory module(s) do not match.
To illustrate these problems, it is useful to first describe memory systems having point-to-point memory module connections wherein only a single memory module is employed or multiple memory modules of matching memory capacity are employed. For example, FIG. 1 illustrates two such memory systems having point-to-point memory module connections. More particularly, FIG. 1A shows a memory system 10 with one memory module 12 connected to a first port 14 of a memory controller 16. FIG. 1B shows a memory system 20 with the first memory module 12 connected to the first port 14 of the memory controller 16, and a second memory module 28 connected to a second port 30 of the memory controller 16.
The memory modules 12 and 28 in FIGS. 1A and 1B are divided into ranks (rows) of memory components (MEM) 32. The number of ranks is denoted NR, and may vary from module to module. Note that in some memory systems the point-to-point connection constraint may extend to the memory component as well as the memory module. In such a case, the number of ranks NR is limited to one.
The memory modules 12 and 28 in FIGS. 1A and 1B are also divided into slices (columns) of memory components (MEM) 32. The number of slices is denoted Ns, and may also vary from module to module. However, the number of slices Ns times the number of data-type signals per slice Ndq is a constant (NDQ=Ns*Ndq), determined by the number of data-type signals at a memory controller port NDQ.
The notion of xe2x80x9cslicexe2x80x9d is used to distinguish address-type signals xe2x80x9cAxe2x80x9d from data-type signals xe2x80x9cQDxe2x80x9d. The data-type signals (QD) from a slice of a memory controller port are only connected to a corresponding slice of each rank of memory components in a memory module. The address-type signals (A) are connected to all slices of each rank of memory components in a memory module. The address-type signals (A) can usually fan-out to more memory components than can data-type signals for several reasons including: [1] the signaling rate of address-type signals (A) is typically lower than data-type signals, and [2] address-type signals (A) are typically unidirectional (flowing from memory controller to memory components) and data-type signals (QD) are typically bi-directional (flowing in one direction at one time and flowing in the opposite direction at another time).
In addition to memory components (MEM) 32, each memory module 12 and 28 also contains some form of termination structure (T) 22 at the end of each signal wire. This is typically some sort of resistor component, and is typically required due to high signaling rates in a memory system.
Other connection topologies within a memory module are also possible, and will be described in detail below. The topologies shown in FIG. 1 are representative of these other connection topologies, and are used as an example to illustrate the problem arising from the need to upgrade memory systems with point-to-point connections between memory controller and memory module(s).
FIG. 2 shows the internal detail of the memory component (MEM) 32 that is used in the memory modules of FIG. 1. The address-type signals (A) typically comprise row signals (ARCLK/ARSTROBE, AREN, OPR, ABR, and AR) and column signals (ACCLK/ACSTROBE, ACEN, OPc, ABC, and AC) . The data-type signals (QD) typically comprise read signals (QEN, QCLK, QSTROBE, and Q) and write signals (DEN, DCLK, DSTROBE, D and DM). Both the address-type signals (A) and the data-type signals (QD) are used to control access to 2NB banks of memory core 34.
ARCLK/ARSTROBE is a timing signal which is used to indicate when other row signals carry valid information. Such a timing signal is usually called a xe2x80x9cclockxe2x80x9d or xe2x80x9cstrobexe2x80x9d signal. AREN is a control signal which is optionally present. It is an xe2x80x9cenablexe2x80x9d signal that can indicate when the valid information carried by other row signals is to be used or ignored by the memory component (MEM) 32. OPR is a set of signals (a set of Nopr wires) that is used to indicate what type of row operation is to take place. ABR is a set of signals (a set of Nb wires) that is used to indicate the bank address for a row operation. AR is a set of signals (a set of Nr wires) that is used to indicate the row address for a row operation. Three row decode blocks 36 are provided which include storage elements (registers and/or latches) and logic that are needed to provide row control signals to the memory core 34 at the appropriate time.
ACCLK/ACSTROBE is timing signal which is used to indicate when other column signals carry valid information. Such a timing signal is usually called a xe2x80x9cclockxe2x80x9d or xe2x80x9cstrobexe2x80x9d signal. ACEN is a control signal which is optionally present. It is an xe2x80x9cenablexe2x80x9d signal that can indicate when the valid information carried by other column signals is to be used or ignored by the memory component. OPc is a set of signals (a set of Nopc wires) that is used to indicate what type of column operation is to take place. ABC is a set of signals (a set of Nb wires) that is used to indicate the bank address for a column operation. Ac is a set of signals (a set of Nc wires) that is used to indicate the column address for a column operation. Three column decode blocks 38 are provided which include storage elements (registers and/or latches) and logic that are needed to provide column control signals to the memory core 34 at the appropriate time.
Note that in some memory components, some of the above sets of signals could share the same wires. However, these signals are shown in FIG. 2 in unshared form for purposes of descriptive clarity.
There are two principle types of row operation: activate and precharge. When an activate operation is indicated, one of the 2Nr rows of the 2Nb banks of the memory core 34 is selected by row drivers 40 of the memory core 34. (2Nc*M*Ndq) bits of the selected row are then sensed and latched by column sense amplifiers 42 of the memory core 34. When a precharge operation is indicated, the column sense amplifiers 42, row drivers 40, and other circuitry of the memory core 34 are returned to a precharged state to await the next activate operation.
There are two principle types of column operation: read and write. When a read operation is indicated, one of the 2Nc columns of the 2Nb banks of the memory core 34 is selected, and (M*Ndq) bits of the selected column are transferred to a multiplexer 44. This data is grouped into xe2x80x9cMxe2x80x9d sets of xe2x80x9cNdqxe2x80x9d bits. The multiplexer 44, which performs a parallel-to-serial conversion on the data, transfers xe2x80x9cNdqxe2x80x9d bits at a time (repeated xe2x80x9cMxe2x80x9d separate times) to xe2x80x9cNdqxe2x80x9d data output pins (Q). QCLK and QSTROBE are timing signals which are asserted and generated, respectively, to indicate when the data output pins (Q) carry valid information. Note that QCLK is typically supplied by an external source, but could be generated inside the memory component (MEM) 32. It could also be synthesized internally from one of the other timing signals. QSTROBE is typically generated inside the memory component (MEM) 32 in response to QCLK. A memory component might have both QCLK and QSTROBE present, or it might have only one of the signals present, or it might have neither present. In the last case, a timing signal for output data is typically synthesized from other timing signals present in the memory component (MEM) 32.
QEN is a control signal which is optionally present. It is an xe2x80x9cenablexe2x80x9d signal that can indicate whether the memory component (MEM) 32 is to drive output data onto the data output pins (Q).
When a write operation is indicated, one of the 2Nc columns of the 2Nb banks of the memory core 34 is selected, and (M*Ndq) bits are received at a first demultiplexer 46. This data is grouped into xe2x80x9cMxe2x80x9d sets of xe2x80x9cNdqxe2x80x9d bits. The first demultiplexer 46, which performs a serial-to-parallel conversion on the data, receives xe2x80x9cNdqxe2x80x9d bits at a time (repeated xe2x80x9cMxe2x80x9d separate times) from xe2x80x9cNdqxe2x80x9d data input pins (D). DCLK and DSTROBE are timing signals which are asserted to indicate when the data input pins (D) carry valid information. Note that these timing signals are typically supplied from an external source. They could also be synthesized internally from one of the other timing signals. A memory component might have both QCLK and QSTROBE present, or it might have only one of the signals present, or it might have neither present. In the last case, a timing signal for input data is typically synthesized from other timing signals present in the memory component (MEM 32).
DEN is a control signal which is optionally present. It is an xe2x80x9cenablexe2x80x9d signal that can indicate whether the memory component (MEM) 32 is to receive input data from the data input pins (D).
The xe2x80x9cDMxe2x80x9d pins carry xe2x80x9cNdmxe2x80x9d signals which supply mask information for the write operation. These signals are treated like the write data signals from a timing perspective, passing though a second demultiplexer 48 and undergoing a serial-to-parallel conversion. The (M*Ndm) mask signals are passed to the memory core 34 along with the (M*Ndq) data signals and control which of the data bits are written to the selected (M*Ndq) storage cells of the column sense amplifier and eventually to the corresponding storage cells of the selected row of the selected bank.
Note that the signals carried on input (D) and output (Q) pins are usually carried on the same wires (i.e., the QD data lines shown). However, they are shown separately in FIG. 2 for purposes of descriptive clarity. Also note that some of the other timing and control signals could also share the same wires. Again, however, these signals are shown in FIG. 2 in unshared form for purposes of descriptive clarity. In any event, as previously indicated, all the signals associated with the input (D) and output (Q) pins share the same topology (i.e., connecting from a slice of a memory controller port to corresponding slices of ranks of memory components in a memory module).
With the basic point-to-point connection topology memory systems of FIG. 1 now having been fully described, it is now appropriate to describe the problems which can arise when port-per-module memory systems having point-to-point memory module connections employ memory modules of differing memory capacity. To describe these problems, it is useful to describe several alternative port-per-module memory systems having point-to-point memory module connections wherein multiple memory modules of differing memory capacity are employed.
FIG. 3 illustrates the simplest alternative port-per-module memory system called an exclusive port-per-module memory system. In this alternative, a memory request is directed to either of two memory controller ports. There is no attempt to operate the two memory controller ports simultaneously. This alternative has the advantage that the performance of the memory system does not depend upon the relative sizes and presence of the memory modules. The disadvantage of this alternative is that the memory system has underutilized resources (the memory controller ports and memory modules) relative to a memory system which is able to operate memory modules simultaneously.
In FIG. 3, there are five cases shown: a first memory module 62 of capacity xe2x80x9c1xxe2x80x9d only in FIG. 3A; the first memory module 62 and a second memory module 64 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c1xxe2x80x9d, respectively, in FIG. 3B; the first memory module 62 and a second memory module 66 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c2xxe2x80x9d, respectively, in FIG. 3C; the first memory module 62 and a second memory module 68 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c4xxe2x80x9d, respectively, in FIG. 3D; and the first memory module 62 and a second memory module 70 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c8xxe2x80x9d, respectively, in FIG. 3E. In each case, a memory controller 50 comprises a read multiplexer 52 that selects read data from one of two memory controller ports 58 and 60, and two drivers 54 and 56 that transmit write data to one of two memory controller ports 58 and 60. Also, in each case, the unified memory space presented by the memory controller 50 to the rest of the system consists of the larger memory space in the lower addresses, and the smaller memory space in the upper addresses. In the case of FIG. 3E with the xe2x80x9c1xxe2x80x9d/xe2x80x9c8xxe2x80x9d memory modules, the xe2x80x9c8xxe2x80x9d memory module 70 occupies the low 2NA+3 words. The xe2x80x9c1xxe2x80x9d memory module 62 occupies the high 2NA words.
Each addressable word is ND bits in size, where ND=M*NDQ, and NDQ=Ns*Ndq. NDQ is the number of QD signal wires per rank, and ND is the number of bits transferred serially in xe2x80x9cMxe2x80x9d successive time intervals on the NDQ wires. Ns is the number of slices (memory components) per rank, and Ndq is the number of QD signal wires per slice (memory component).
FIG. 4 illustrates a second alternative port-per-module memory system called an independent port-per-module memory system. In this system, there are two sets of address, read data, and write data signals between a memory controller 74 and the rest of the system (not shown). These two sets of signals are appended with a xe2x80x9cuxe2x80x9d or xe2x80x9cvxe2x80x9d to distinguish them. They are connected to memory request sources in the system (e.g., central processing unit, graphics unit, I/O unit, etc), and they permit two simultaneous memory requests to be performed.
In FIG. 4, there are five cases shown: a first memory module 86 of capacity xe2x80x9c1xxe2x80x9d only in FIG. 4A; the first memory module 86 and a second memory module 88 with memory capacities xe2x80x9c1xxe2x80x9d/xe2x80x9c1xxe2x80x9d, respectively, in FIG. 4B; the first memory module 86 and a second memory module 90 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c2xxe2x80x9d, respectively, in FIG. 4C; the first memory module 86 and a second memory module 92 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c4xxe2x80x9d, respectively, in FIG. 4D; and the first memory module 86 and a second memory module 94 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c8xxe2x80x9d, respectively, in FIG. 4E. In each case, the memory controller 74 comprises two address multiplexers 76u and 76v, two read data multiplexers 78u and 78v, and two write data multiplexers 80u and 80v. The address multiplexers 76u and 76v have address queues 82u and 82v, respectively, and the write data multiplexers 80u and 80v write data queues 84u and 84v, respectively, for accumulating memory request addresses and write data, as described in detail below.
This second alternative port-per-module memory system is called an independent port-per-module memory system because two memory module spaces are accessed independently. Typically, a high order address bit of the Au and Av address buses is used to select between the first and the second memory modules. Each memory request on the xe2x80x9cuxe2x80x9d and xe2x80x9cvxe2x80x9d buses is steered to the queue for the appropriate memory module.
In the case of FIG. 4E with the xe2x80x9c1xxe2x80x9d/xe2x80x9c8xxe2x80x9d memory modules, the second memory module 94 will typically receive eight times as many memory requests (per unit of time) as the first memory module 86 if the requests are evenly distributed across the memory spaces. This is the reason for the queues, since they permit memory requests to the more-dense memory module to be accumulated until each less frequent memory requests for the less-dense memory module is received. This insures that the memory system achieves the best possible performance level, but doesn""t fix the fundamental problem of an uneven request rate to the two mismatched memory modules.
Some applications may be able to guarantee that the numbers of memory requests per unit of time to each memory module are reasonably balanced. This may be possible by placing more frequently accessed code and data structures in the less-dense memory module. If this is not possible, then the performance of a system with two mismatched memory modules (e.g., 1x/8x) might have lower performance than a system with two matched modules (e.g., 1x/1x) even though there is more memory in the mismatched system. This is very undesirable, since it is expected that if the amount of memory is increased in a system, the performance will increase.
FIG. 5 illustrates a third alternative port-per-module memory system called a lockstep port-per-module memory system. As in the second alternative memory system of FIG. 4, in the third alternative memory system of FIG. 5 there are two sets of read data and write data signals between a memory controller 96 and the rest of the system (not shown). These two sets of signals are appended with a xe2x80x9cuxe2x80x9d or xe2x80x9cvxe2x80x9d to distinguish them. They are connected to memory request sources in the system (e.g., central processing unit, graphics unit, I/O unit, etc), and they permit two simultaneous memory requests to be performed.
However, unlike the second alternative memory system of FIG. 4, in the third alternative memory system of FIG. 5 there is only a single address bus xe2x80x9cAxe2x80x9d between the memory controller 96 and the rest of the system (not shown).
In FIG. 5, there are five cases shown: a first memory module 114 of capacity xe2x80x9c1xxe2x80x9d only in FIG. 5A; the first memory module 114 and a second memory module 116 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c1xxe2x80x9d, respectively, in FIG. 5B; the first memory module 114 and a second memory module 118 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c2xxe2x80x9d, respectively, in FIG. 5C; the first memory module 114 and a second memory module 120 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c4xxe2x80x9d, respectively, in FIG. 5D; and the first memory module 114 and a second memory module 122 with memory capacities of xe2x80x9c1xxe2x80x9d/xe2x80x9c8xxe2x80x9d, respectively, in FIG. 5E. In each case, the memory controller 96 comprises address decode logic 98, a read data buffer 100, a read data multiplexer 102, a read data driver 104, a write data buffer 106, a write data multiplexer 108, and two write data drivers 110 and 112.
This third alternative memory system of FIG. 5 is called a lockstep port-per-module memory system because each memory request is made to two memory modules in lockstep (i.e., simultaneously). The Ru read data and Wu write data is steered from/to the QD1 data bus of a first memory controller port 124, and the Rv read data and Wv write data is steered from/to the QD2 data bus of a second memory controller port 126. This permits memory requests to be completed at the maximum possible rate as long as there are equal amounts of memory in each memory module. However, if the memory modules are mismatched, the performance will drop. This can be best seen in the case of FIG. 5E with the xe2x80x9c1x/8xxe2x80x9d memory modules 114 and 122, respectively. When the memory space above the 2NA address is accessed, memory locations will only be available in the second memory module 122. For a read operation, it will be necessary to access two memory locations sequentially in the second memory module 122 and steer them to the Ru and Rv buses. For a write operation, it will be necessary to steer the Wu and Wv buses to the second memory module 122 for two sequential accesses. As a result, the upper memory space can only be accessed at half the rate of the lower memory space. As in the second alternative memory system of FIG. 4, in the third alternative memory system of FIG. 5 it is possible that adding memory to the system may cause its performance to be lowered.
In view of the foregoing, it would be desirable to provide at least one technique for increasing bandwidth in port-per-module memory systems having mismatched memory modules which overcomes the above-described inadequacies and shortcomings in an efficient and cost effective manner.
According to the present invention, techniques for increasing bandwidth in port-per-module memory systems having mismatched memory modules are provided. In one exemplary embodiment, the techniques are realized through a memory component having a memory core for storing data therein. The memory component comprises a first set of interface connections for providing access to the memory core, and a second set of interface connections for providing access to the memory core. The memory component also comprises memory access circuitry for selecting between a first mode wherein a first portion of the memory core is accessible through the first set of interface connections and a second portion of the memory core is accessible through the second set of interface connections, and a second mode wherein both the first portion and the second portion of the memory core are accessible through the first set of interface connections.
In accordance with other aspects of this exemplary embodiment of the present invention, the first portion and the second portion of the memory core are subsets of the entire memory core. Also, the first portion and the second portion of the memory core do not overlap.
In accordance with further aspects of this exemplary embodiment of the present invention, the first and second sets of interface connections provide access to the memory core so as to read data from the memory core and write data to the memory core. Also, the memory access circuitry includes multiplexers for steering data between the first and second sets of interface connections and the memory core. Further, the memory access circuitry also includes decode logic for controlling the multiplexers.
In accordance with still further aspects of this exemplary embodiment of the present invention, the memory access circuitry further includes bypass circuitry for transferring information from the first set of interface connections to the second set of interface connections, and vice versa.
In another exemplary embodiment, the techniques are realized through a method for accessing a memory core of a memory component, wherein the memory component has first and second sets of interface connections for providing access to the memory core. The method comprises receiving at least one memory access control signal, and then decoding the at least one memory access control signal so as to select between a first mode wherein a first portion of the memory core is accessible through the first set of interface connections and a second portion of the memory core is accessible through the second set of interface connections, and a second mode wherein both the first portion and the second portion of the memory core are accessible through the first set of interface connections.
In accordance with other aspects of this exemplary embodiment of the present invention, the first portion and the second portion of the memory core are subsets of the entire memory core. Also, the first portion and the second portion of the memory core do not overlap.
In accordance with further aspects of this exemplary embodiment of the present invention, the memory core is accessible during the first and second modes to read data from the memory core and write data to the memory core. Also, the memory core is accessible during the first and second modes through a multiplexing stage.
In accordance with still further aspects of this exemplary embodiment of the present invention, the method also comprises accessing the first portion of the memory core through the first set of interface connections and the second portion of the memory core through the second set of interface connections during the first mode.
In accordance with still further aspects of this exemplary embodiment of the present invention, the method also comprises accessing both the first portion and the second portion of the memory core through only the first set of interface connections during the second mode.
In accordance with still further aspects of this exemplary embodiment of the present invention, the method also comprises transferring data from the first set of interface connections to the second set of interface connections, and vice versa.
The present invention will now be described in more detail with reference to exemplary embodiments thereof as shown in the appended drawings. While the present invention is described below with reference to preferred embodiments, it should be understood that the present invention is not limited thereto. Those of ordinary skill in the art having access to the teachings herein will recognize additional implementations, modifications, and embodiments, as well as other fields of use, which are within the scope of the present invention as disclosed and claimed herein, and with respect to which the present invention could be of significant utility.