1. Field of the Invention
The present invention generally relates to memory management and, more particularly, to reducing memory traffic in DRAM ECC mode.
2. Description of the Related Art
Computer generated images that include 2D and 3D graphics objects are typically rendered using a graphics processing unit (GPU) with one or more multistage graphics processing pipelines. Such graphics pipelines include various programmable and fixed function stages. Programmable stages include various processing units that execute shader programs to render graphics objects and to generate various visual effects associated with graphics objects.
One element of a memory subsystem within certain processing units is a Level 2 cache memory (“L2 cache”). The L2 cache is a large on-chip cache memory that serves as an intermediate point between an external memory (e.g., frame buffer memory) and internal clients of the memory subsystem. The L2 cache temporarily stores data that the clients are reading from and writing to the external memory, which is often a dynamic random access memory (DRAM). In such systems, coherency has to be maintained between data present in the L2 cache and the data stored in the external memory. “Dirty data,” that is, data transferred from a client to the L2 cache during a write operation, needs to remain in the on-chip until the data has been “cleaned,” by replicating the data in the external memory.
Dirty data that is transferred to an L2 cache can be checked and verified by utilizing error correcting code (ECC). When DRAM ECC is enabled, an ECC checksum can be computed. In some implementations, computing the ECC checksum requires 32 bytes of data. In such instances, the L2 cache ensures that all dirty data transmitted to DRAM (such as a frame buffer) is fully covered (i.e., the data comprises a full 32 bytes). The frame buffer can then compute the checksum when it receives the data. Because partial writes of less than 32 bytes of data can occur in the L2 cache, the L2 cache system is configured to issue a fill request to the frame buffer for the remaining unwritten bytes so that a full 32 bytes of data is always sent to the frame buffer, which allows the ECC checksum to be computed.
One drawback to the above approach to managing writes from the L2 cache to the frame buffer is that issuing and servicing fill requests consumes a relatively large amount of time and data bandwidth. Consequently, overall system performance can be negatively impacted by the above approach.
As the foregoing illustrates, what is needed in the art is an improved technique for handling partial writes in an L2 cache with ECC enabled.