Parallel processor architectures are commonly used to perform a wide array of different computational algorithms. An example of an algorithm that is commonly performed using such architectures is a scan operation (e.g. “all-prefix-sums” operation, etc.). One such scan operation is defined in Table 1.
TABLE 1[I, a0, (a0 ⊕ a1), . . . , (a0 ⊕ a1 ⊕ . . . ⊕ an−1)]
Specifically, given an array [a0, a1, . . . , an-1] and “I” being an identity element for the operator, the array of Table 1 is returned. For example, if the operator “⊕” is an addition operator, performing the scan operation on the array [3 1 7 0 4 16 3] would return [0 3 4 11 11 15 16 22], and so forth. While an addition operator is set forth in the above example, such operator may be any binary associative operator that operates upon two operands. Because of the wide applicability of scan operations, there is a continued need to more efficiently perform computational algorithms such as scan operations using parallel processor architectures.