1. Field of the Invention
The invention relates to a memory arrangement which contains a plurality of Random Access Memory (RAM) chips having a respective multiplicity z of memory cells.
2. Description of the Related Art
The memory cells in a RAM chip, which is subsequently also referred to as a “RAM” for short, are usually arranged in matrix form in rows and columns. Selective access to a memory cell for the purpose of writing or reading a data item is effected by activating a word line associated with the relevant row on the basis of a row address and connecting a bit line associated with the relevant column to a bidirectional data port on the RAM. This connection is set up using a data line network containing amplifiers and switches which can be selectively activated on the basis of a column address.
RAMs are normally in a form such that each access clock cycle involves not just a single memory cell but rather a group of m memory cells being able to be selected simultaneously, in order to write or read m data bits simultaneously in parallel form. To this end, the addresses and the data line network are designed such that in response to a column address m bit lines are simultaneously connected to m data connections on the data port of the RAM via the data line network. With this memory organization, each column address therefore selects an entire cell group in the row determined by the row address.
The number m, that is to say the power of the disjoint cell groups and hence the bit width of the data passing through the data port, is preferably a power of 2; m-values of 4, 8 and 16 are currently usual. Many RAMs, particularly DRAMs, are configured during manufacture such that the m-value can be selected or set in order to operate the RAM optionally in 4-bit, 8-bit or 16-bit mode.
To produce RAM data storage with a large storage capacity and/or with a high data throughput, it is usual practice to combine a plurality k of RAM chips, which are respectively integrated on a chip and are designed or set for the same bit width m, to produce one module on a board. In the prior art, all k chips are simultaneously accessed in parallel mode in order to write or read a packet of k data groups, each of which comprises m parallel data items, during each access operation. To this end, the module has a central data port for n=m*k parallel bits and a central n-bit parallel register (the symbol * represents a multiplication sign here and below). The data ports on the k chips are connected to the central register (which serves as a data buffer between the central n-bit module port and the RAM chips) in parallel via a respective associated m-bit data bus.
An example of the design of a known memory module having n=64 data connections is shown in the top part of FIG. 1 in the appended drawings. The bottom part of this figure shows the diagrams for the data transfer in this module for a burst length r=1 and a burst length r=4. In FIG. 1, as in the other figures of the drawing, elements of the same type have been denoted by the same abbreviations (letters or combinations of letters), usually followed by a serial number in order to distinguish them. Where the text of the description contains a collection comprising a plurality of elements of the same type in a set, the numbers which follow have been placed in square brackets [], with a colon “:” between two numbers representing the word “to” and a comma “,” representing the word “and”. By way of example, “data groups d[1,5]-1” is to be read as “data groups d1-1 and d5-1”.
The known memory module from FIG. 1 contains a “rank” comprising k=8 RAM chips D[1:8], which are respectively set to a bit width of m=8, in order to write or read data packets with a bit width n=k*m=64 via the central module data port DP. The data are transmitted between the data port DP and the RAMs D[1:8] via a central buffering data register DR. The RAMs D[1:8] are respectively connected to the data register DR in parallel via a respective associated instance of eight 8-bit data buses DB. The data interface of each RAM, that is to say the junction to the respective associated data bus, usually contains a local data buffer in the form of an m-bit parallel register (not shown).
Usually, the data packets which have been input or output at the data port DP are sent or received by a controller (not shown), which also delivers control signals to the input port SP of a control signal register SR. These control signals comprise all the necessary signals for command and time-control for the operating cycles within the RAMs and also control bits (“selection bits”) for addressing eight cell groups in the rank, specifically a respective one in each of the eight RAMs D[1:8] in the rank, for each 64-bit data packet. For the example shown in FIG. 1, it is assumed that the RAMs D[1:8] each contain z=227 memory cells, split over 4=22 memory banks B. To select a cell group of m=8=23 cells within a RAM, 24 address bits are therefore required. The selection bits delivered by the controller comprise a total of 25 bits, namely an additional bit in order to put the rank in standby.
This total of 25 selection bits is allocated by the control signal register SR as follows:                1 Rank selection bit,         which is applied to the “Chip Select” connections C of all the RAMs D[1:8] via a line DS in order to blanket-select the RAMs (that is to say to put the entire rank on standby) by means of the logic value “1” of this bit, whereas the logic value “0” of this bit means that the rank is “not selected”;        2 Bank address bits         for selecting between 22 banks within the RAM;        12 Row address bits         for selecting between 212 row addresses within the bank;        10 Column address bits         for selecting between 210 disjoint column groups of m=8 columns each and hence m=8 cell groups for each row address.        
The 24 address bits for addressing banks, rows and columns within the RAMs are applied to the RAMs D[1:8] via an address bus AB. The address bus AB usually contains just 14 address lines, namely 2 lines for the bank address bits and 12 further lines, via which the 12 row address bits are transmitted first. The 10 column address bits are then subsequently transmitted via 10 assigned instances of these 12 lines.
The transmitted 25 selection bits arrive at the usual access control device A in each RAM, which sets up the read or write connection between the selected cell group and the data bus DB of the relevant RAM in a known manner. The lines for transmitting the other control signals from the control signal register SR to the RAMs are not shown in the figure, so as not to make the drawing too complicated. The central control signal register SR and the access control devices A in the RAMs thus together form the “selection device” for the memory cell access.
Since the individual RAM chips D[1:8] are arranged at a physical distance from one another, the data buses DB between the data register DR and the various chips are not all of the same length, which means that delay time differences arise on account of the differences in distance. The same applies to the control lines between the chips and the control signal register SR. The result of this is that after the start of a read access operation the 8-bit data groups from the various chips do not arrive all data at the data register DR simultaneously but rather at staggered times, which has disadvantageous consequences. The pattern of this time stagger is dependent on the specific physical arrangement of the parts of the module.
The module shown in FIG. 1 is a “dual inline memory module” (DIMM) with a rank in which the two halves, with four RAM chips each, are arranged symmetrically with respect to the transmission/reception block SE, which contains the registers DR and SR for the data and control signals. This means that two RAM chips are always at the same distance away from the transmission/reception block SE. Following the start command for access, a certain control signal delay time elapses before the control signals and selection bits transmitted by the control signal register SR have arrived at the two physically closest chips D[1,5] in order to initiate the actual read operation. A certain RAM response time then elapses before the data from the selected memory cells are available on the data connections of the chip and can be retrieved. Following retrieval, another data delay time elapses before the data have reached the data register DR via the associated data bus DB. The sum of these three time periods, that is to say the total loop delay for the “round trip” through the loop, which is routed from the transmission/reception block SE via the closest RAM chips D[1:5] back to the transmission/reception block SE, is subsequently labelled τ1:τ1=loop delay via D[1,5].
The greater the distance between the RAM chips and the transmission/reception block SE, the longer it takes before the read data arrive at the data register DR in the transmission/reception block SE following the start command, because the control-signal and data delay times become longer as the distance increases (only the RAM response time does not change). For the chip pairs D[2:6], D[3,7] and D[4,8], increasingly longer loop delays are therefore obtained on the basis of the following definition:τ1+τ2=loop delays via D[2,6]τ1+τ2+τ3=loop delays via D[3,7],τ1+τ2+τ3+, τ4=loop delays via D[4,8].
The bottom part of FIG. 1 shows timing diagrams for the time-staggered arrival of the data at the register DR. Each read data group transmitted from a RAM chip via the associated data bus DB to the register DR and comprising m parallel bits is shown by a box, the length of which indicates the “bit length” τd of the data. The bit length τd is the length of time from the start of the leading edge to the end of the trailing edge of a data pulse.
The left-hand timing diagram in FIG. 1 illustrates the case in which a single n-bit packet (burst length r=1) is read. At time t0, the start command for the read access operation is given on the control signal register SR. The data register DR first of all, after the loop delay τ1 at time t1, receives the 8-bit data groups d1 and d5 from the two closest chips D1 and D5; after a further delay τ2, the data groups d2 and d6 arrive from the chips D2 and D6 at time t2, followed after a further delay τ3 by the data groups d3 and d7 from the chips D3 and D7 at time D3, and finally after a further delay τ4 by the last data groups d4 and d8 in the packet from the two most distant chips D4 and D8 at time tp. Only then, but no later than at time tp+τd, have all the data received last been validly loaded into the register DR, and all the data groups d[1:8] can be forwarded to the data port DP in parallel as a 64-bit packet. An additional waiting timeTx=τ2+τ3+τ4between the arrival of the first data group and the arrival of the last data group therefore arises for a read access operation.
The aforementioned additional waiting time Tx does not change at all when a burst comprising a plurality of successive 64-bit packets is read on the memory module within a read cycle after the start command, as illustrated in the right-hand timing diagram in FIG. 1 for the case of a burst length of r=4. The first n-bit packet, comprising the first m-bit data groups d[1:8]-1 (that is to say data groups d1-1, d2-1, . . . , d8-1), has arrived at the data register DR fully only after the loop delay τ1 plus the additional waiting time Tx=τ2+τ3+τ4 at time tp1.
When the burst clock rate has been set to the fastest possible value 1/τd, as shown in FIG. 1, the next three packets d[1:8]-2, d[1:8]-3, d[1:8]-4 in the burst arrive at the respective destination at time intervals of τd. The total time Tb from the arrival of the first data to the end of the burst at the reception location is thus at least equal toTb=Tx+4*τd,or, generally for any number k of RAM chips in the memory module and for any burst length r:Tb=Tx+r*τd,where Tx is the delay time difference between the data buses on the closest of all k chips and the data buses on the furthest of all k chips.
After a read cycle has started, it is thus always necessary to wait for the time period Tb in total before the next read cycle or a subsequent write cycle can be started. The additional waiting time Tx thus limits the speed at which individual read cycles on the memory module can follow one another or at which a write cycle can follow a read cycle.
U.S. Pat. No. 6,396,766 B1 discloses a memory arrangement in which the central register is split into two sub-registers, each of the two sub-registers being connected to a portion of the RAM chips, with the sub-registers being arranged relative to the connected RAM chips such that essentially the same data bus lengths are ensured between the RAM chips and the respective sub-register, in order to reduce read and write delays on account of different data bus lengths.
U.S. Pat. No. 6,330,636 B1 also discloses a memory arrangement operating in burst mode.