Single instruction multiple data (SIMD) is a class of parallel computers. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism (DLP). That is, there are simultaneous (parallel) computations, but only a single control process (instruction) at a given moment. SIMD instructions are used in SIMD and vector architectures (see Flynn, “Some Computer Organizations and Their Effectiveness, IEEE Transactions On Computers, Vol. c-21, No. 9, September 1972). SIMD instruction sets offer an efficient way to accelerate DLP. A specific way of providing support for SIMD instructions is through vector processing systems, i.e. computer systems using vector architecture. This patent uses the terms “vector” and “SIMD” interchangeably.
A vector processing system is a system configured to process a plurality of values with a single instruction. The vector processing system may comprise a number of vectors, or vector registers, each having a number of elements with a unique index assigned to each element. The indexes may be assigned in an ascending order, the ascending order corresponding to the position of the elements in the vectors. Implementing an algorithm using SIMD instructions may be considered an algorithm vectorization.
Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields including scientific computing and database management systems.
Radix Sort is a non-comparative numerical sorting algorithm. Zagha et al. (see M. Zagha and G. E. Blelloch, “Radix Sort for Vector Multiprocessors,” Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, ser. Supercomputing '91, 1991, pp. 712-721) proposed a way to vectorize Radix Sort. The vectorized Radix sort algorithm requires storing data to arrays using indexed accesses. In indexed accesses, the elements may be located at arbitrary locations in memory with the addresses of the elements indicated by the contents of a second vector. This is known as gather in its load form. Accordingly, the term scatter is used in its store form, respectively. During a scattering operation, multiple elements within the same vector may index to the same memory location thus causing a conflict. To prevent this conflict, vectorized radix sort replicates the involved arrays, which in itself is a drawback. The other main drawback in this technique is that the array being sorted needs to be accessed with a non-contiguous (stride) pattern.
The existing SIMD instruction sets (see e.g. Cray Assembly Language (CAL) for Cray X1™ Systems Reference Manual, S-2314-51—October 2003, 7.7. Vector Register Instructions) used by microprocessor architectures, such as the Cray X1™ systems, do not offer a direct solution for handling such conflicts. One skilled in the art may appreciate that vectorized Radix sort is only one example of an algorithm with a need to avoid conflicts when scattering to an array. In order to vectorize other algorithms conflicts may also need to be avoided when scattering to an array.
It is desirable to provide new SIMD instructions and vectorized sorting algorithms that would avoid conflicts with the use of the new SIMD instructions.