The development of ever more advanced microprocessors and associated bus architectures continues at a rapid pace. Current computer systems employ advanced architectures and processors such as Pentium Pro®, Pentium II®, Pentium III®, and Pentium IV® processors, as manufactured by the Intel Corporation of Santa Clara, Calif. In such computer systems, the bus architecture is optimized for burst performance. Generally, the bus architecture may include dedicated buses for one-to-one coupling of devices, or non-dedicated buses that are multiplexed by a number of units and devices (e.g., bus agents). By optimizing the bus architecture for burst performance, the system processor is able to achieve very high memory and I/O bandwidths.
One technique for providing burst performance is provided by caching of data within either the level one (L1) or level two (L2) caches available to the processor. For example, when the processor recognizes that an operand being read from memory is cacheable, the processor reads an entire cache line into the appropriate cache. This operation is generally referred to as a “cache line fill.” Likewise, writes to memory are cached and written to memory in cache line bursts write cycles. Unfortunately, within certain applications, such as graphics applications, writes from the processor are most often pixel writes. As a result, the writes tend to be 8-bit, 16-bit or 32-bit quantities, rather than the full cache lines required to provide burst performance.
As a result, a processor is normally unable to run burst cycles for graphics operations. To address this problem, advanced computer architectures are designed to use a new caching method, or memory type that allows internal buffers of the processor to be used to combine smaller or partial writes (automatically) into larger burstable cache line writes, which is referred to herein as “write-combining.” In order to provide write-combining within a memory region, the memory region is defined as having a write-combining (WC) memory type.
However, the WC memory type is a weakly ordered memory type. System memory locations designated as WC are not cached, and coherency is not enforced by the processor's coherency protocol. In addition, writes may be delayed and combined in the write-combining buffers to reduce partial memory writes. Unfortunately, processor write-combining makes no guarantees with respect to the order in which bits are flushed from the write-combining buffers. As a result, the burst performance capability provided by write-combining may not be useful to applications which have strict requirements as to the order in which bits are flushed from the write-combining buffers. Furthermore, the available write-combining buffer sizes may be insufficient for certain applications which require high efficiency.