1. Field of the Invention
The present invention relates generally to computer arithmetic and more specifically to techniques for performing integer division.
2. Description of the Related Art
Current computer systems are typically configured with the hardware capability to support graphics processing units (GPUs) that offload graphics rendering tasks from the general purpose processor of the computer system to the specialized processors of the GPU. GPUs are designed with a highly parallelized structure to efficiently perform the complex and expensive computations required for rendering graphics. Many current GPUs are multi-core units configured to execute applications in a multi-threaded manner. For example, Nvidia's GeForce® 8 GPU has 128 processing cores, each having its own floating point unit (FPU) and a set of 1024 registers. Each cluster of 8 processing cores also has 16 KB of shared memory supporting parallel data access. Such an architecture is able to support up to 12,288 concurrent threads, with each thread having its own stack, registers (i.e., a subset of the set of 1024 registers in a processing core), program counter and local memory.
GPUs are increasingly utilized in mission critical or fault tolerant computer systems, where error correction is important. For example, computer systems used at high altitudes by the military or for space research may encounter levels of radiation that have the potential to affect and corrupt bits in a computer system's memory. To date, however, memory systems of GPUs have not been implemented with error correction capabilities. In order to support error correction capabilities, GPU memory systems need to allocate a portion of memory as storage for checksums that correspond to actual stored data. Calculating memory addresses for both the location of requested data and the corresponding location of the checksum for requested data requires the use of integer division.
However, many GPUs do not incorporate integer division logic into the digital circuit design of their arithmetic logic units (ALUs) because integer division operations tend to be infrequent operations that do not justify the hardware expense to incorporate such logic. Furthermore, the standard logic that is used to implement integer division is too slow and consumes too much circuit space for purposes of the memory address calculations discussed above.
As the foregoing illustrates, what is needed in the art is a technique enabling a GPU in a computer system to efficiently computed memory addresses using integer division for address translation purposes.