The present invention generally relates to graphics processing units and their use for general purpose computing. More specifically, the invention relates to methods for reconfiguring different memory operation modes, notably, an error checking and correction (ECC) mode and a non-ECC mode on an application-dependent basis.
Graphics processors have evolved from relatively simple video processors to extremely powerful, fully programmable graphics processing units (GPU) or visual processing units (VPU). GPUs and VPUs (hereinafter, GPUs) were developed for three-dimensional (3D) “gaming” applications, that is, applications that rely on a representation of a highly complex 3D environment for entertainment purposes on the computer screen, within which a user (“gamer”) can navigate in real time. Early versions of so-called “graphics adapters” only processed the geometry setup and painting of the surfaces with textures. However, contemporary games now contain much more complex modes of representing a virtual reality through enabling mini-programs called shaders that, for example, simulate swaying grass or leaves, or else glass shattering and pyrotechnic effects such as flames and smoke. All the above mentioned effects are processed almost exclusively on modern GPUs. In view of the above, 3D computer games can be generally described as software for entertainment purposes whose operation relies on a GPU to accelerate the process of drawing complex scenes on a display in realtime, such as applications that run on the Microsoft DirectX® application programming interface (API). A major factor for handling the massive computational load in 3D-gaming applications is the extremely high level of parallelism, meaning that contemporary GPUs typically contain several hundreds of execution units. In addition, averaging of values and assumption-based creation of place holders can create smoother images at the expense of precision rendering of the data.
GPUs are increasingly being used also for “general purpose computing” applications, in other words, applications requiring computations traditionally handled by a central processing unit (CPU), including data base mining, semantic analysis of text documents, and filter executions in applications such as Adobe Photoshop®, a graphics editing program available from Adobe Systems Incorporated. In this case, the software is optimized to take advantage of the parallel processing power of GPUs. Three-dimensional (3D) computer-aided design (CAD) applications can also be said to broadly fall under this definition of general purpose computing since, in most cases, they are limited by the geometry processing, which is done on the CPU. Moreover, CAD applications are used to perform simulations of, for example, thermal conductance, fluid flow and their regulation, and to analyze the structural integrity of complex designs, all of which require extreme accuracy of all computational steps. Because of the massive number of parallel processing units, general purpose computing applications that are capable of taking advantage of GPUs often run orders of magnitude faster on these highly specialized GPUs than on general purpose mainstream CPUs, such as those commercially offered by Advanced Micro Devices, Inc. (AMD) and Intel Corporation.
A prerequisite for extreme performance in suitable applications is the availability of large memory bandwidth. Typical desktop CPUs have one or two 64-bit memory channels available supporting up to 1600 Mbps, resulting in a theoretical memory peak bandwidth of 12800 MB/sec per channel. Because of arbitration between physical banks (ranks) and other latency issues, the maximum achievable bandwidth in synthetic benchmarks is at best some 60% of the theoretical bandwidth, topping out at approximately 16 MB/sec for dual channel configurations. In contrast, GPUs typically use a point to point memory bus as wide as 512 bit and run specialized graphics memory (GDRAM, GDDRx) at ultra high frequencies of up to 5 Gbps, achieving as much as 320 GB/sec memory bandwidth.
In the case of 3D gaming applications, data are typically non-recurrent, that is, as scenes are rendered, data are being used to, for example, texture a surface, which is then discarded with the next frame. Consequently, a single bit error in, for example, a texture may result in a single pixel having a wrong color and that particular pixel is only displayed for the duration of one frame, which is typically less than 1/60 of a second. Only in the rarest of cases will such a pixel deviation even become obvious to a gamer. However, in the context of general-purpose computing on GPUs (general purpose GPUs or GPGPUs; also referred to as GPGP or GP2), it is paramount that critical applications need to be aware of any memory errors, whether they are hard errors or soft errors, for example, the flipping of a single bit caused by a cosmic ray. If a soft error occurs in a data base application and causes a shift in a floating point or decimal, the consequences can be catastrophic since the system will not be aware of the error and the corrupted data entry can proliferate throughout the entire database. In desktop computing (including servers) this problem has been addressed by using error checking and correction (ECC) algorithms (also known as error correction code), such as Reed Solomon (R-S) or Bose-Ray-Chaudhuri-Hoquenhem (BCH), which rely on generating a checksum of the data during writes and then cross-referencing the checksum on subsequent reads with a recalculated checksum of the data.
ECC memory implementations have been used for decades and are a mandatory feature in the server space where soft errors must be avoided at all cost since they can cause database corruption or, in the extreme case, used to launch viruses. However, ECC implementations come with a performance hit because every transaction requires the calculation of the checksum and a comparison with the previously stored checksum of the data set. In the case of system memory, this performance hit is relatively minor, typically in the order of about 3 to about 5% bandwidth reduction. However, with increasing memory data rate and bandwidth, the load on the ECC unit increases and some projections for memory subsystems, such as the onboard local memory of a high-end graphics cards, estimate a performance degradation of as much as about 40 to about 70%.
In high-end graphics cards used primarily for computer 3D gaming applications, memory bandwidth is extremely crucial for performance, especially if image enhancing routines like anti-aliasing are carried out. Anti-aliasing uses averaging of different sets of pixels for the purpose of reducing a jaggedness of diagonal lines and “crawling” and “shimmering” effects. In such scenarios, ECC is largely irrelevant since a soft error would primarily have the effect of changing the color value of a single pixel in one frame, an error that is typically not noticeable at all. However, if the same graphics card is used for general purpose computing, ECC becomes absolutely mandatory.