Embodiments presented herein generally relate to database management systems, and more specifically, to efficiently sorting a large set of database records.
Database management systems (DBMS) provide functions for defining, creating, querying, updating, and administrating databases. Sorting records in a database is a well-known issue in database management. A DBMS may receive a large amount of unsorted database records. So that the database records are more easily indexed and searchable, the DBMS must sort the database records. Sorting is a common but computationally expensive operation in data processing.
A sorting technique commonly used in a DBMS is sample sort. Sample sort is a technique often used in parallel processing systems. Under the sample sort approach, a DBMS receives an input data set of database records to be sorted by value (e.g., by character value, by integer value, etc. of a given record column). The DBMS defines partitions and partition ranges for the data set based on a sampling of the data set. Then, the DBMS inserts each record into an appropriate partition. The DBMS sorts each individual partition and merges the partitions to create a sorted data set.