In many known computer systems, a host CPU may execute an application which calls for graphics operations to be performed. To implement such graphics operations, the application will typically fetch initial graphics data and primitives (including, but not limited to, textures, geometry, models, etc.) from offline storage (including, but not limited to, network, CD or hard-drive disk storage) and create a copy of the graphics data and primitives in online system memory. The application may operate on the graphics pixels, data models and primitives in the online system memory and then, at some point, the application may call for a graphics processor of the computer system to operate on the graphics data and primitives, typically in order to offload low-level rendering tasks from the host CPU.
According to known implementations, when invoking operations by the graphics processor, the application will create a second copy, separate from the copy initially loaded into online system memory from offline storage, of the graphics data and primitives for the graphics processor to operate on. This second, separate copy (which may be referred to herein as an “aliased” copy) may typically be placed in a region of online system memory which may be referred to as “graphics memory” because it is set aside for use by the graphics processor. Various implementations of graphics memory are known in the art. For example, discrete add-in graphics adapter cards may contain graphics memory which is locally connected by a private memory bus on the card; this is typically referred to as “local video memory”. In another example, in chipsets with the known Intel® Hub Architecture, a region of system memory designated Advanced Graphics Port (AGP) memory is used as graphics memory. AGP memory may also be referred to as “non-local video memory”.
The graphics processor would typically operate on the aliased copy of the graphics data in the graphics memory for a period of time. Typically, the graphics memory containing the aliased copy of the graphics data would be assigned an uncached attribute in the host CPU's memory page attribute tables, meaning that application access to the graphics data would not take advantage of the host CPU's cache while that data was in this uncached graphics memory region to be processed by the graphics processor. After the uncached, aliased copy had been processed by the graphics processor for a period of time, it would typically be necessary to return to the application for further processing of the graphics data. According to the aforementioned implementation, however, the application operates on the copy of the graphics data in the system memory. This system memory would typically have been assigned a cached attribute, so that the CPU could perform the application's operations in a cached mode. As is well known, cached operations by a CPU allow the CPU to be more efficient than uncached operations.
In order for the application to continue operating on the graphics data after the graphics processor, of course, changes to the aliased copy made by the graphics processor need to be reflected in the copy in the system memory used by the application.
The application may continue to process the copy in the system memory for a period of time in cached mode, and then again turn processing over to the graphics processor. Naturally, changes to the copy in the system memory must be reflected in the aliased copy in the graphics memory when the graphics processor takes over again. The foregoing exchange between the application and the graphics processor may be repeated many times.
It may be appreciated that the foregoing arrangement entails disadvantages. One disadvantage is that two copies of the same graphics data must be maintained, consuming valuable system memory resources. Additionally, valuable CPU bandwidth is consumed in creating and maintaining the two separate copies, particularly in propagating respective updates between the two copies across buses between multiple interfaces.
Implementations which do not involve maintaining two copies of graphics data as described in the foregoing are known. According to one such implementation, cacheable system memory is made available to a graphics processor for use as graphics memory, and the graphics processor as well as the host CPU perform operations on graphics data in the graphics memory. As described previously, the graphics processor and the host CPU take turns operating on the graphics data. Because the memory is cacheable, the CPU is able to operate in cached mode for improved efficiency.
However, this approach introduces the possibility of data “incoherency”. That is, because the CPU uses the graphics memory in cached mode, the data that the graphics processor has been asked to perform operations on may not yet have been flushed (i.e., evicted from the cache and written out to the graphics memory). Rather, the data may reside somewhere between the internals of the CPU and the L1 and L2 caches, and not have actually reached the graphics memory yet. Thus, when the graphics processor accesses the graphics memory to attempt to perform operations on the required data, it may not be able to find the most recent version of the required data. Instead, the data in the graphics memory may be “stale”. Or worse, the data may be emptied from the cache, just after the graphics processor has completed accessing the data location, thereby invalidating the operation.
To handle the problem of incoherency, chipset “snoop cycles” have been utilized. Snoop cycles involve the graphics processor causing the chipset to force coherency in the CPU cache with respect to the graphics memory before the graphics processor is allowed to access the graphics memory. Snoop cycles, however, entail the disadvantage of requiring a considerable amount of overhead, which detracts from system performance. Snoop cycles inspect memory data on a location-by-location basis, and if the required location's data is still in the CPU's cache, it is extracted and made coherent on a location-by-location basis. Such operations require a great deal of “handshaking” between interfaces, and are inefficient because they must be performed on a location-by-location, or line-by-line basis.
According to yet another implementation, graphics memory is used strictly in uncached mode. In this method, the data in the graphics memory is always kept coherent, since whenever the CPU wishes to read or write data to the graphics memory, the writes always go directly and immediately to the graphics memory and are never cached. A disadvantage associated with this method, however, is that the improved CPU performance afforded by cached operations is not available.
In view of the foregoing considerations, a method and system are called for which overcome the deficiencies of existing implementations.