A computer system typically includes a processor, a chipset, a main memory, and a number of peripheral components. The processor further includes a cache memory. Although data is generally stored in the main memory, a copy of the data currently needed is usually stored in the cache memory to allow fast access of the data. A single data entry stored in the cache memory is commonly referred as a cacheline. The size of a cacheline varies among different computer systems.
Data may be transferred between the processor, the main memory, and the peripheral components within a computer system via components of the chipset. Typically, data is transferred between the main memory and other components within the computer system via a memory controller hub of the chipset. Large inbound read transactions targeting the main memory, a request for data is usually broken up into smaller reads by the memory controller hub. Each read retrieves a cacheline of data from the main memory, which is typically referred to as a read completion. For example, a 64-Byte cacheline system completes a request for 512 Bytes of data in eight read completions, where each read completion includes a cacheline of data, i.e. 64 Bytes. A request is completed in cacheline quantities because processors and memory controllers typically operate on cacheline quantities.
Due to the innovations in computer technologies, such as high-speed microprocessors running at 10 GHz, the existing parallel input/output interconnect, Peripheral Component Interconnect (“PCI”) developed over ten years ago can no longer meet the demands for high speed and bandwidth. To cope with the demands for high speed and bandwidth, serial input/output interconnect has been developed. The latest serial input/output interconnect is PCI Express™ (“PCI Express” is a trademark of the PCI-Special Interest Group), which is the third generation of input/output interconnect. PCI Express™ is a high-speed serial interconnect, capable of sending multiple read completions for one read request. On PCI Express™, a large request to retrieve data from the memory could be completed in several transactions. Each transaction returns data that partially satisfies the request. The data returned may contain a cacheline of data.
As discussed above, a read request on PCI Express™ could result in several read completions. The prior approach handles one read completion at a time. In other words, when the interface receives a read completion, it waits until the PCI Express™ port is not busy to send the read completion via the PCI Express™ port to the requester. Read completions are sent via the PCI Express™ port one at a time at a fixed size, even though multiple read completions can be combined into one larger completion. The former approach is adopted because it is simple and fair between multiple requesters. However, this approach is very inefficient because the bandwidth of the PCI Express™ port is not fully utilized. An interface implementing a 64-Byte cacheline system achieves only 72% efficiency.