The present invention relates in general to data transfer within a computing environment, and in particular to peer-to-peer data transfer within a computing environment.
In modern computing environments, various devices are connected to one another via an interconnectivity fabric such as a network or bus structure. The devices generally contain local memory that is used by a device during a computation, and multiple devices are operated in parallel to provide processing speed and flexibility within the computing environment.
One example of such a computing environment is used for graphics processing. Multiple graphics processing units (GPUs) are connected to one another by an interconnectivity fabric, and each GPU is coupled to a frame buffer (i.e., local memory). The frame buffer stores graphics data being processed by the individual GPUs. Generally, large amounts of data need to be processed by the GPUs to render textures and create other graphics information for display. To achieve rapid processing, the processing task is divided among the GPUs such that different components of the task are performed in parallel.
At times, in such a computing environment, one of the GPUs may be required to use information that is stored in the frame buffer of a peer GPU or may be required to write information to a frame buffer of a peer GPU so that the peer GPU may use that information. Presently, implementations of many interconnectivity fabric standards such as AGP, PCI, PCI-Express™, advance switching and the like enable peers to write information to another peer's address space but do not enable reading of information stored in another peer's address space. Consequently, the GPUs will duplicate effort to create data that their peers have already created because they do not have access to a peer's frame buffer in peer address space where that information is stored. Alternatively, when a GPU completes processing of certain information that will be needed by a peer, the GPU may write that information to a commonly available system memory. The system memory is accessible by any of the GPUs connected to the interconnectivity fabric. However, using common system memory for such data transfers is time consuming and increases overhead processing. Invariably, the use of common system memory slows the graphics processing.
Therefore, there is a need in the art for an improved method and apparatus of transferring information from peer-to-peer within a computing environment.