Field of the Invention
The present invention generally relates to error detection and correction in memory systems and more specifically to error detection and correction for external dynamic random access memory (DRAM).
Description of the Related Art
Certain processing units include a plurality of multi-threaded processing cores that can be configured to perform high throughput, highly parallel computations. One example of a processing unit comprising multi-threaded processing cores is a graphics-processing unit (GPU). The GPU can be configured to execute graphics programs, which typically require very high computational throughput, on the multi-threaded processing cores to generate real-time graphics images. Because graphics programs and corresponding data sets for the graphics programs typically require a significant amount of memory, external memories, such as discrete DRAM chips, are conventionally attached to the GPU to provide additional storage. Each DRAM chip includes input/output (I/O) data pins and I/O control pins that are attached to the GPU via one or more sets of I/O pins on the GPU.
Many conventional applications for GPU processing, such as real-time three-dimensional graphics entertainment applications, do not require a high degree of computational integrity and therefore do not require hardware-assisted error detection and correction for data stored in external DRAM chips. For example, if one frame out of eighty-five frames generated in one second includes one erroneous pixel value due to a soft error in external DRAM, a user may not notice or care that the error occurred. However, certain applications that may significantly benefit from high-throughput multi-threaded processing capabilities provided by GPU devices do, in fact, require error detection and correction because these applications require correct results.
A conventional approach to implementing error detection and correction in a processing unit, such as a central processing unit (CPU), includes protecting each data transaction between the processing unit and DRAM memory with an error correction code (ECC). The ECC includes a set of protection bits that are added to corresponding data bits for each transaction. The values of the protection bits are computed based on the values of the data bits. To accommodate storage of the protection bits, a conventional ECC implementation requires additional I/O pins on the processing unit to transmit the protection bits and at least one additional DRAM chip to store the protection bits. When multiple, independent DRAM channels require ECC protection, at least one additional DRAM chip per channel is conventionally required to provide storage for the corresponding ECC protection bits. However, adding I/O pins and DRAM chips to a processing unit system in order to support ECC represents an unnecessary and potentially significant cost burden for many processing unit applications that do not require ECC support.
Accordingly, what is needed in the art is a system and method for enabling a GPU to support ECC protection of data in DRAM without conventional cost burdens associated with ECC protection.