Conventional SIMD arrays having a plurality of processing cells or elements (PEs) utilize a number of different approaches to move data among the elements to effect a variety of signal processing algorithms that need to make a global decision based on decisions made locally in each of the processing elements e.g., motion estimation-finding the best matching block (minimum-using the Sum of Absolute Difference (SAD) criteria) between all local candidates in a given search area, RAKE receiver-finger (each RAKE finger computes the correlation between the despreaded received code samples and the reference scrambling code samples) with the maximum correlation, maximum, minimum, and global thresholding. In one approach each element is connected to each of its neighboring elements in all four directions. This allows data to be moved in any direction and can culminate the processing at any element but requires significant power, bus structure, area, and cycle time to complete operations. One attempt to reduce the bus structures uses a unidirectional interconnection e.g. all PEs send left and receive right-or, send from up to down at any given cycle so that only half of the PE-to-PE interfaces are utilized at a time. In operation, for example, the data can be moved in each row, all the way to the right-most elements, then moved down that column of elements to a single element in the lower right corner. However, each row and column has an end around connection so that the bank of data can be moved so as to culminate at any particular focus, e.g., lower right, upper left this approach also supports a folded array in which the array of rows and columns of elements are folded over on a diagonal of the array permitting all of the elements, except those along the diagonal, to be combined in dual clusters thereby reducing by half the bus structures and overall area. All of these approaches require a large number of cycles, approximately the sum of the number of columns and rows, to complete the data movement. See the Elixent reconfigurable ALU array (RAA) at www.elixent.com. See also the XPP architecture at www.PACTCORP.com.