The present disclosure generally relates to information processing systems, and more particularly relates to a system and method capable of accelerated sorting of data elements in an array data structure.
Sorting is one of the most fundamental kernels in information management systems, such as in databases, in Hadoop (i.e., a Java-based programming framework that supports the processing of large data sets in a distributed computing environment), and so on, where data volume has been doubling nearly every 40 months since the 1980's. For example, sorting is an essential kernel in database indexing, redundancy removal, data clustering, in-equi join, and so on, which suffer heavily by the exploding data volume. Accelerating such sorting, therefore, can expedite many big data analytics and offer high value to customers.
There are many sorting algorithms which can be mapped into a hardware (HW) accelerator. Among them, a radix sort can be ideal for HW mapping, due to its distribution nature. Differently from quicksort and mergesort, the radix sort does not require expensive comparators, which allows a linear complexity on radix sorting. Simply using the key value itself as an index, radix sort can recursively distribute and further sort the input data elements. However, mapping a radix sort algorithm into an extremely high-performance HW implementation has been very challenging.
In-place radix sort is a popular distribution-based sorting algorithm for short numeric or string keys. It has a linear run-time and constant memory complexity. However, efficient use of in-place radix sort is very challenging for at least the following two reasons. First, the initial phase of permuting elements into buckets suffers read-write dependency inherent in its in-place nature. Secondly, load-balancing of the recursive application of the algorithm to the resulting buckets is difficult when the buckets are of very different sizes, which happens for skewed distributions of the input data.
The radix sort can be one of the best suited sorting kernels for many in-memory data analytics due to its simplicity and efficiency. Especially in-place radix sorting, which performs sorting without extra memory overhead, is highly desirable for in-memory operations for two reasons: a) The large memory footprint of in-memory databases calls for memory efficient supporting algorithms; and b) In-place radix sort can deliver higher performance with significantly fewer cache misses and page faults than approaches requiring extra memory. However, mapping a radix sort algorithm in an extremely high-performance HW implementation has been very challenging to reduce to practice.