The invention relates generally to computer system memory read operations and more particularly, but not by way of limitation, to a method and apparatus for reducing memory read latency.
If memory read latency is defined as the time lag between when a device issues a request to memory for data and the moment the data begins to be received by the device, then latency represents a key indication of a computer system's performance. The lower the latency, generally the better the performance. A naive approach to minimizing memory read latency would be to immediately transfer, to the requesting device, each quantum of data as it is provided by the memory. One reason this approach is not taken in modern computer systems is that memory access operations are mediated by system controllers having internal data transfer paths greater than that of either the memory or the requesting device. To obtain high data transfer rates, these controllers aggregate data received from, for example, system memory before forwarding it to a requesting device. The act of aggregation creates a latency that otherwise would not be present. This latency is particularly problematic for processor initiated memory read operations because, as a computer system's primary computational engine, if the processor is waiting for data no useful work is being performed.
Referring to FIG. 1, prior art computer system 100 includes processor 102, system memory 104, system controller 106 (incorporating processor interface 108, memory interface 110 and primary bus interface 112), processor bus 114 coupling processor 102 to processor interface 108, memory bus 116 coupling system memory 104 to memory interface 110 and primary bus 118 coupling other system devices 120 (e.g., network interface adapters and/or a secondary bus bridge circuit and components coupled thereto) to processor 102 and system memory 104 via primary bus interface 112.
In many current systems such as computer system 100, processor bus 114, memory bus 116 and primary bus 118 are 64-bit structures (another common primary bus width is 32-bits). At the same time, system controller 106 may utilize 128-bit (or greater) internal data transfer paths. Because of this, data received from system memory 104 during a memory read operation is aggregated by memory interface 110 before being forwarded to a destination interface and, ultimately, to the requesting device. For example, if processor 102 initiates a memory read request for a 32-byte block of data (a common size for a cache line), after memory interface 110 receives the first 8-bytes (64-bits) from system memory 104, it waits until it receives the second 8-bytes before sending the entire 16-byte unit to processor interface 108 and, ultimately, processor 102.
The delay or latency caused by the act of aggregating successive data units received from system memory can result in processor stalls, thereby is reducing the operational/computational efficiency of the computer system. Thus, it would be beneficial to provide techniques (methods and apparatus) to reduce memory read latency in a computer system.