Computer applications having concurrent threads executed on multiple processors present great promise for increased performance but also present great challenges to developers. The growth of raw sequential processing power has flattened as processor manufacturers have reached roadblocks in providing significant increases to processor clock frequency. Processors continue to evolve, but the current focus for improving processor power is to provide multiple processor cores on a single die to increase processor throughput. Sequential applications, which have previously benefited from increased clock speed, obtain significantly less scaling as the number of processor cores increase. In order to take advantage of multiple core systems, concurrent (or parallel) applications are written to include concurrent threads distributed over the cores. Parallelizing applications, however, is challenging in that many common tools, techniques, programming languages, frameworks, and even the developers themselves, are adapted to create sequential programs.
Grouping operations represent one area of applications where parallel improvements are available but largely unexploited. Grouping operations receive a sequence of elements and place those elements into predetermined groups, where each element in the sequence is inspected as it is grouped. At times, the sequence can contain millions of elements or more. The performance effect of sequentially grouping elements in the predetermined groups is that the time used to perform the grouping operation is related to the number of elements of the sequence.