Signaling rate advances continue to outpace core access time improvement in dynamic random access memories (DRAMs), leading to memory devices and subsystems that output ever larger amounts of data per access in order to meet peak data transfer rates. In many cases, the increased data output is achieved through simple extension of the output burst length; the number of data transmissions executed in succession to output data retrieved from a given location within the memory core. FIGS. 1A-1C illustrate this approach within a prior-art memory system 100 formed by memory controller 101 and memory module 103. Memory module 103 includes two sets of memory devices, shown in grouped form as memory devices A and memory devices B, with all the memory devices coupled to a shared command/address path (CA), shared clock line (CLK) and shared chip-select line (CS), and with memory devices A coupled to a first set of data lines, DQ-A, and memory devices B coupled to a second set of data lines, DQ-B. Referring to FIG. 1B, the memory controller initiates a memory access by outputting a row activation command (ACT) and column access command (RD) and associated row and column address values onto the command/address path during successive cycles, 0 and  1, of a clock signal (Clk), asserting the chip-select signal (i.e., CS=1) during both clock cycles. All the memory devices (i.e., memory devices A and memory devices B) respond to assertion of the chip-select signal by sampling the command/address path during clock cycles 0 and 1 to receive the row activation command and the column access command, collectively referred to herein as a memory access command. Thereafter, each of the A and B memory devices responds to the memory access command received during clock cycles 0 and 1 by activating the address-specified row within the memory core, then retrieving read data from the address-specified column within the activated row. Accordingly, some time, TRD, after receipt of the memory access command, the read data values retrieved within each of the A and B memory devices are output in parallel data burst sequences (i.e., with each transmission within the burst sequence being consecutively numbered, 0-3) on data lines DQ-A and DQ-B during clock cycles 4 and 5. By this operation, memory access transactions may be pipelined so that the memory access command for a given transaction is transmitted simultaneously with data transmission for a previously-transmitted memory access command. Because the data burst length (sometimes called “prefetch”) matches the memory access command length, the command/address and data path resources may be fully utilized during periods of peak data transfer.
FIG. 1C illustrates a timing arrangement that may result as the data and command signaling rates are doubled relative to the core access time. As shown, the amount of data retrieved from the memory core is doubled in order to meet the increased bandwidth of the data interface, thereby extending the data burst length by an additional two clock cycles (i.e., as shown by transmissions 4-7 during clock cycles 6 and 7) on data lines DQ-A and DQ-B and doubling the granularity of the memory access. The extended burst length and resulting increased access granularity produces two potentially undesirable effects. First, because the trend in a number of data processing applications is toward finer-grained memory access, the increased data burst length may result in retrieval and transmission of a substantial amount of unneeded data, wasting power and increasing thermal loading within the memory devices. Additionally, utilization of the command/address and data path resources is thrown out of balance as the extended burst length prevents memory access commands from being transmitted back-to-back (i.e., in successive pairs of clock cycles) and thus results in periods of non-use on the command/address path.