1. Field of the Invention
The present invention generally relates to data transfer within a computing environment and, more particularly, to providing byte enables for peer-to-peer data transfer within such a computing environment.
2. Description of the Related Art
In modern computing environments, a multitude of devices are generally interconnected to provide processing speed and flexibility within the computing environment. To create such a computing environment, various devices are connected to one another via an interconnectivity fabric such as a network or bus structure. The devices connected to the interconnectivity fabric generally include local memory that is used by a device during a computation.
One example of such a computing environment is used for graphics processing, where a plurality of graphics processing units (GPUs) are connected to one another by an interconnectivity fabric and each GPU is coupled to a frame buffer (i.e., local memory). The frame buffer stores graphics data being processed by the individual GPUs. Generally, large amounts of data need to be processed by the GPUs to render textures and create other graphics information for display. To achieve rapid processing, the processing task is divided amongst GPUs such that components of the task are performed in parallel.
At times, in such a computing environment, the graphics processing units may need to utilize information that is stored in the frame buffer of a peer GPU or may be need to write information to a frame buffer of a peer GPU such that the peer GPU may locally utilize that information. In some cases, such data are stored in a non-contiguous or tiled fashion, where particular data of interest are dispersed throughout a region of the frame buffer. Consequently, the GPU may need to read or write certain memory locations within a range of the frame buffer of a peer GPU, while leaving other memory locations in the same range untouched. Presently, implementations of many interconnectivity fabric standards such as AGP, PCI, PCI-Express™, advance switching and the like enable peers to exchange information stored in another peer's address space, but have limited capability to read or write non-contiguous or tiled data. For example, PCI-Express allows data transfers to specify that only certain bytes within the first four-byte group and last four-byte group of a data packet are written or read. In contrast, all other four-byte groups in the data-packet are transferred in full, without the ability to identify specific bytes to write or read.
Consequently, the graphics processing units limit data packets to two four-byte groups, where data transfers are directed to non-contiguous or tiled data. One problem with this approach is that many interconnectivity fabric standards define a substantial amount of overhead data (header data) that is transferred along with the data of interest (payload data) in order to complete the data transfer. Where payload data is limited to two four-byte groups per transfer, the header data may be a substantial amount of the total data transfer, reducing the percentage of the data packet devoted to payload data. As a result, payload data is transferred across the interconnect fabric with reduced efficiency. Data transfers including non-contiguous or tiled data, thereby, take longer to complete than data transfers including contiguous data.
As the foregoing illustrates, what is needed in the art is an improved technique to provide byte enables for peer-to-peer data transfer within a computing environment.