1. Field of the Invention
This invention relates to the field of database sorting and particularly to an apparatus and method for efficiently performing a query sort on a data set with duplicate key values.
2. Description of the Related Art
A database management system (DBMS) is a program that enables multiple computer users to access and create data in a database. The DBMS manages database requests and ensures the integrity and security of the data. The most typical DBMS is a relational database management system (RDBMS) that enables the creation and maintenance of a relational database. A relational database includes a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. Another kind of DBMS is the object-oriented database management system (ODBMS) that supports the modeling and creation of data as objects. It should be noted that the present invention is not limited to a specific type of DBMS.
Sorting is a frequent and resource-intensive operation performed by the DBMS. The DBMS may employ a general purpose sorting algorithm that is good at sorting a wide variety of data efficiently. However, a general purpose sorting algorithm may not be as efficient in certain special, but commonly occurring scenarios. One such scenario involves sorting a data set where many of the key values include duplicates. While the DBMS normally maintains statistics to recognize that the values of a particular key are not unique and include duplicates, the general purpose sorting algorithm typically does not leverage this information. As a result, during sorting operations the general purpose sorting algorithm typically sorts all of the duplicate values in the key. For a data set with N elements, a good sorting algorithm with an order N*Log N time complexity, will perform O(N*Log N) operations.
Consequently, these general purpose sorting algorithms spend needless CPU processing time sorting duplicate key values. These duplicate values also occupy significant memory space during the sorting operation. For a long key, storing duplicate values during the sorting operation may exhaust the available space in the main memory. So, for example, if an intermediate result of a sorting operation cannot fit in the main memory, one or more partial results of the sort operation will have to be written to an external device, and then read back for merging. This causes unnecessary input/output (I/O) activities and delays.
There is a need to reduce the amount of system resources utilized during sorting operations. Hence, there is a need to produce a sorting algorithm that efficiently uses CPU and memory resources, thereby reducing I/O activities and delays.