Memory system performance plays an important role in the overall performance of a computer processing system. One proposed memory system architecture varies memory module data transfer granularity by partitioning a memory module into independent portions called data threads. Each data thread transfers data in response to thread-specific commands to provide a threaded data transfer granularity that is finer than an aggregate data transfer granularity of the module (typically 64 Bytes). One variant of the proposed module threading architecture employs a buffer circuit on each memory module to buffer the memory devices on the module from a primary data bus coupled to a memory controller. As a single electrical load, the buffer architecture allows for greater memory capacity along the primary bus without corresponding parasitic loading of the primary bus.
Although threaded buffered modules provide signal integrity benefits by minimizing loading on the primary data bus, the buffer circuitry generally introduces additional read latency. For computing systems that employ “critical word first” policies, where a processor can restart without waiting for a full block of data to be loaded, read latency can have a significant impact on processor wait times.
Thus, the need exists for read latency improvements in buffered modules that employ module threading.