Technical Field
The present invention relates to sorting data and, more particularly, to parallelizing a quicksort.
Description of the Related Art
Sorting a dataset is an important function of computing systems that is performed in nearly every field of computing. Sorting is of particular importance in the field of database management, where large datasets are sorted according to the needs of a user or of an application.
As parallel computing solutions become more prevalent, and in particular as distributed computing solutions such as clouds take a prominent role, in many ways the parallel computing solutions are limited by their sorting ability. For example, cloud computing efforts can perform large computations in relatively little time, but sorting is still an important part of their process. For example, aggregation steps in parallel computing assemble the data from multiple parallel processes and make use of sorting. However, sorting is a difficult problem for parallel computing. While quicksort is known to be particularly fast for general sorting of data, it is very difficult to parallelize.
Quicksort employs a “pivot,” which is one element from the data set to be sorted. The pivot is assumed to be the median of the set and individual elements are swapped, starting from the ends of the data set and working inward, if they are on the wrong sides of the pivot. The pivot will not, in the general case be the actual median of the set, but by performing one pass of the quicksort, the resulting set has everything lower than the pivot to the left of the pivot and everything higher than the pivot to the right of the pivot. The quicksort process is then recursively repeated for each of those two sets—lower and higher—with a new pivot being selected for each. In this manner, the set is eventually sorted.
Quicksort is challenging to parallelize. Because the true median is not known at first, it is difficult to split the sort without resorting to locks to share information between threads.