1. Field of the Invention
This application is related to commonly owned U.S. Patent applications entitled “Enhanced Bus Transactions for Efficient Support of a Remote Cache Directory Copy” (U.S. Ser. No. 10/961,742), “Low Latency Coherency Protocol for a Multi-Chip Multiprocessor System” (U.S. Ser. No. 10/961,751), “Graphics Processor With Snoop Filter” (U.S. Ser. No. 10/961,750), “Snoop Filter Directory Mechanism in Coherency Shared Memory System” (U.S. Ser. No. 10/961,749), which are herein incorporated by reference.
2. Description of the Related Art
Computer systems have been used for over 50 years to process digital information. Over that time, computers have developed into high speed devices that can process tremendous amount of information at high speeds and at low cost in a remarkable number of applications. However, new applications that demand even higher performance at lower cost continue to emerge.
One approach to achieving higher performance is to utilize multiple processors in a system, such as one or more central processing units (CPUs) and one or more graphics processing units (GPUs). The CPUs typically utilize one or more high speed caches to provide high speed local access to data being currently manipulated, thus avoiding relatively slow accesses to external main memory. Many highly computationally intensive applications involve transferring data, locally cached by one processor, to another processor.
For example, real-time rendering of graphical images is highly computationally intensive. Input data for the graphics processors is commonly produced by one or more of the CPUs. For example, the CPUs may produce or modify graphics primitives (utilized by the GPU) which thus reside in the CPU caches. Therefore, in such multiprocessor systems, this cached data is often transferred from the CPU to the GPU. Conventionally, this data transfer has been relatively slow as the data is first written to main memory (for backing) instead of directly between the processors, in an effort to maintain coherency.
Accordingly, there is a need for an improved method and system for speeding the transfer of data between processors, for example, without any actual backing of the data in external memory.