Data transfer occurs between a central processing unit and memory and various other components in a computer system through a collection of wires or an internal bus. Data transfer may also occur between internal components and external devices through a bus, often referred to as an expansion bus. Various standards have been created that specify the manner of data transfer over a bus. For instance, the peripheral component interconnect (PCI) standard is a local bus standard developed by INTEL. A local bus comprises a data bus that may be connected directly to a microprocessor. Another standard, referred to as PCI-Express (PCIE), is an input/output (I/O) interconnect bus standard that comprises a defined protocol and architecture. The PCIE standard expands upon the PCI standard, for instance, by doubling the data transfer rates. PCIE specifies a 2-way serial connection that carries data in packets along two pairs of point-to-point data lanes (as opposed to the single parallel data bus of PCI). PCIE was developed to address the high data transfer speeds found in such high speed interconnects as 1394b, USB 2.0, InfiniBand, and Gigabit Ethernet.
One challenge presented by PCIE is that of not allowing bytes to be skipped during write operations to various components (e.g., writes to memory). For instance, some applications, such as stencil (s-data) and depth (z-data) operations in graphics processing applications or color/alpha processing, may not require the entire packet body to be utilized. For instance, with regard to stencil and depth operations, the z-data occupies three of four bytes, s-data occupies one of the four bytes, and computation of z-values may be the operation of interest (to the exclusion of the s-data). Conventional approaches have fallen into two broad categories of solutions to this problem. One approach is to perform a read operation before the write, enabling a combined (merged) write that results in whatever was not intended to be written (e.g., stencil byte) to simply be re-written as it was. However, one problem with such a read and write approach is that such operations tend to be inefficient, which thus hampers performance.
Another approach is to segment the packet into manageable units to obtain the byte-enable features of the conventional PCI standard. That is, the conventional PCI standard includes provisions for byte-masks at the head and tail portion of the packet (i.e., only a portion of the entire packet body). For example, for a 512-bit packet, the packet may be segmented into eight transactions of 8 bytes each (e.g., four-bit mask at the header and four-bit mask at the tail). That is, the byte-mask may be enabled for only head and tail portions of bytes for each segment, allowing for selective write operations to be transacted. One downside of such an approach is that for each segmented packet a header needs to be appended, which may result in poor performance due to the passing of additional packet headers.