Arrays in memory of supercomputers often reach large numbers of elements. When packing an array by selecting and moving a subset of elements based on a certain criteria, a single processor must determine which elements to include in the packed array. This process of packing an array typically includes loading the large array (the source array), individually selecting a number of elements from the large array, and individually storing the elements to the packed array (the destination array). Each step in this process typically includes manipulating and performing multiple operations on each element in the large array all with a single processor.
What is needed is an efficient and robust method and apparatus to perform such a packing operation using multiple processors.