Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to an apparatus and method for dynamic polygon or primitive sorting for improved culling.
Description of the Related Art
Prefix Sum or Cumulative Sum of a sequence of numbers x0, x1, x2, etc., is a second sequence of numbers y0, y1, y2, etc. An example of this is defined in the table illustrated in FIG. 13. Prefix Sums have many applications including parallel quicksort and line drawing. A current use case crops up in the atomic instruction support. While some current graphics processing architectures provide instructions to perform atomic operations, they do so inefficiently (e.g., 64 atomics/cycle). The problem becomes more aggravated when atomic operations are directed to the same memory location where there is contention among single instruction multiple data (SIMD) lanes to write to the same location, essentially serializing the atomic access.
A compiler optimization may be applied when the destination address is uniform (i.e., all the SIMD lanes write to the same address). This optimization is referred to as scalar atomics which attempt to perform two basic operations: (1) accumulate and (2) broadcast the result of accumulation to the active lanes. Emulating this behavior efficiently in software is extremely important for the performance of certain applications. Naive solutions have proved to be detrimental as performance drops of up to 25× have been observed compared with performing the atomic operation itself due to the inefficiencies in the algorithm when all SIMD lanes are active. These inefficiencies are addressed by the embodiments of the invention described below.