1. Technical Field
One or more embodiments of the present invention generally relate to the processing of computer system transactions. In particular, certain embodiments relate to protocols for combining write transactions.
2. Discussion
As consumer demand for faster processing and enhanced functionality continues to increase, the importance of computing efficiency and performance also increases. Modern day processors use cache memory as one technique to make the processing of data more efficient, where data in the cache memory is allocated by the operating system (OS) one page at a time and each page contains a number of cache entries. Each cache entry usually holds a certain number of words, known as a “cache line” or “cache block” and an entire line is typically read and cached at once in order to achieve optimum “burst” performance. Unfortunately, processors running certain applications such as graphics applications are most often required to implement pixel writes, which tend to be 8-bit, 16-bit or 32-bit quantities rather than the full cache lines (e.g., 64-byte) necessary to provide optimum burst performance.
As a result, a conventional processor may not be able to achieve the desired level of performance in some cases. To address this problem, more recent computer architectures have been designed to automatically combine smaller, or partial, writes into larger cache line writes. This approach is referred to as processor “write-combining”. Processor write-combining is implemented by tagging each page in memory with a write combining (WC) attribute, which indicates whether partial writes from the page can be combined, and buffering the writes on the processor until a full cache line is obtained. The combined writes are typically then sent to their intended destination by way of a chipset input/output (I/O) hub, where the intended destination might be a memory mapped input/output (MMIO) space of an input/output (I/O) device. The I/O hub serves as a bridge between the processor/processor bus and an I/O interface (e.g., bus) that connects to the I/O device containing the MMIO space.
It has been determined, however, that a cache line is not typically an optimal data length for certain I/O interfaces. For example, one 64-byte cache line is roughly 69% efficient for write transactions (or “writes”) on peripheral components interconnect-express (PCI-Express) buses. While recent approaches have been developed to provide for chipset write combining in order to make writes from the chipset more efficient from the perspective of the I/O interface, a number of difficulties remain.
One difficulty results from the fact that posting memory writes to an intermediate agent such as a chipset I/O hub can cause problems with regard to unordered interfaces. The use of unordered interfaces essentially leads to multiple paths for data traveling from a source to a destination. Since some routes may be shorter than others, the “multipath” effect can lead to the execution of instructions out of their intended order. For example, a posted memory write transaction typically completes at the processor before it actually completes at the MMIO space. Posting enables the processor to proceed with the next operation while the posted write transaction is still making its way through the system to its ultimate destination. Because the processor proceeds before the write actually reaches its destination, other events that the operating system (OS) expects to occur after the write (e.g., a read from the same destination) may pass the write. The result can be unexpected behavior of the computer system.
To address this concern, various consumer/producer ordering rules have been developed so that hardware can maintain the ability to use posting to optimize performance without negatively affecting software operation. Indeed, many ordering rules specifically focus on the flushing of posting buffers. One particular chipset write-combining technique relies upon an I/O software driver to enforce ordering rules by notifying the chipset when it is necessary to flush the buffer contents to the I/O device. Unfortunately, the software driver is a proprietary solution that cannot be used by off-the-shelf OS software and application software.