1. Field of the Invention
The present invention relates generally to the field of parallel processing and, more specifically, to processing extended-precision integer arithmetic and logical instructions.
2. Description of the Related Art
In typical SIMT or SIMD processor architectures, where groups of parallel threads execute a common instruction stream, machine instructions for integer arithmetic, logical operations, and comparisons do not support efficient extended-precision (multi-word) computations along with the correct setting of condition codes. For example, a normal integer ADD instruction supports optional addition with carry-in and optional writing of a condition code register. This allows two extended-precision values, each k words long, to be added in k instructions, but the resulting condition code register does not represent the overall status of the extended-precision result. In particular, the zero flag of the condition code indicates only whether the most-significant word of the result is zero rather than whether the entire k-word result is zero. A separate k-step instruction sequence would be needed to compare the multi-word result with zero, e.g. for controlling a subsequent branch instruction.
Other extended-precision operations are even less efficient. To perform a multi-word MIN (or MAX) using prior machine instructions, one must process multi-word inputs from most-significant to least-significant word, comparing words at each step to determine which input is smaller (or larger). The comparisons continue as long as the input words are equal; unequal words determine the order of the multi-word inputs, and the remaining words must be selected from the smaller (or larger) input. This processing would require inefficient branching or looping constructs, which are particularly inefficient in parallel SIMT and SIMD architectures, because they force multiple parallel threads to follow the additional instructions required by only some threads.
As the foregoing illustrates, what is needed in the art is a mechanism for efficiently performing extended precision operations in a SIMT or SIMD processing environment.