Sorting data is a classic optimization problem with practical application in a wide variety of academic and industrial fields. Computer applications may require high-performance sorting methods to conduct business intelligence analytics, to provide presentation rendering, to respond to external requests from users and applications, and for other tasks. For example, a database may be queried for a list of records that is sorted according to user or application defined criteria. Since the overall processing time to answer these queries is directly impacted by the sort execution time, a high performance sort is needed to provide results in a timely manner. Sort performance is especially important for applications working with big data sets such as database management systems (DBMSs) for large enterprises or high-performance computing (HPC), as the large number of data records may magnify the execution time of any sorting operation.
Multi-threaded processing may be utilized to provide suitable response times for these data intensive applications, wherein resources such as processor cores and/or processing nodes are added according to the data processing workload. With a highly parallelizable workload, multi-threaded processing has the potential to provide optimized performance scaling in a cost efficient and practical manner. Since sorting may contribute a large proportion of the data processing workload, sorting becomes a prime target for parallelization to reduce query latency times and to improve data processing throughput in multi-threaded environments.
Serial sorting techniques such as quicksort are readily available, providing sufficient performance for applications with low to moderate data processing needs. However, these serial sorting methods are less applicable for multi-threaded applications with high data processing needs. While various approaches for parallelizing serial sorting methods have been suggested, these approaches may break down when attempting to process a large number of elements that need to be sorted in a data intensive application, which may number in the billions or more, or when attempting to distribute the workload to a large number of parallel processing threads in a highly multi-threaded environment, which may number in the hundreds or more.
Furthermore, a given data set to be analyzed may include any kind of data distribution, and thus a sort must be able to process a data set regardless of its particular data distribution. Any parallelization approach that requires a lengthy pre or post-processing step to cope with non-uniform data distributions may impose an unacceptable performance penalty by reducing or negating performance gains obtained from parallelization. For example, while radix-sort may be amenable to parallelization as each partition can be independently sorted, the partitioning of data according to most significant bits provides poor workload balancing for non-uniform or skewed data distributions. Thus, a computationally expensive pre-processing step is required for radix-sort to cope with non-uniform data distributions, for example by conducting a serial data scan to determine balanced workload partitions. While a parallel data scan is also possible, this would impose significant processing overhead due to the inter-process communication required to resolve write contention, which only grows worse as the number of threads increases. In either case, the performance penalty from the pre-processing step may outweigh any performance gains from parallelizing the radix-sort.
Based on the foregoing, there is a need for a method to provide high-performance parallel data sorting suited to multi-threaded and multi-node environments.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.