1. Field of the Invention
The present invention relates to improving the efficiency of external sorting when using a set of independent memory buffers. In particular, the problem of generating sorted runs from an initially unsorted data set is considered.
2. Related Art
The problem of sorting data is one of the most common operations in data management. When all the data fits into the available memory efficient sorting dictates the reduction in the number of comparison operations applied. Well known techniques such as quick-sort (The Art of Computer Programming: Sorting and Searching by D. Knuth, published by Addison Wesley (1973)) is typically used for this purpose. However, when the data no longer fits into the available memory reducing the I/O between main memory and disk becomes the primary concern.
Efficient external sorting of data, i.e., when the amount of data to be sorted is larger than the available memory, requires reducing the amount of I/O to and from disk. A common way of achieving this goal is by first generating sorted runs of data to the disk and then merging these runs into a single sorted run. Maximizing the length of these initial sorted runs is an important part of reducing the total I/O associated with external sorting.
In The Art of Computer Programming: Sorting and Searching, by D. Knuth, published by Addison Wesley (1973) an effective approach, called replacement selection for creating these initial sorted runs is presented. The technique presented there operates on a single buffer. Assuming a random data distribution, the average length of a run created by this approach is 2 m where m is the size of the buffer. A typical way of implementing the replacement selection strategy is through the use of the tournament tree approach whereby data is first written to disk in stored runs using a tournament tree. These runs are then merged into a single sorted list. For the purpose of this disclosure we can view the tournament tree as being a black box with the following properties: when a tuple (an element or record to be sorted) is read from disk it is inserted into the tournament tree. When the tournament tree is full and before the next tuple can be read from disk one of the tuples in the tournament tree needs to be written out to disk. The way this tuple is selected is as follows: tuples are written to disk in sorted runs. We consider the last tuple a written to the current sorted run. We select the smallest tuple b in the tournament tree such that b&gt;a to write to disk. If no such tuple b exists then we start a new run and write the smallest tuple in the tournament tree to disk.
In "An Efficient Percentile Partitioning Algorithm For Parallel Sorting" by B. Iyer, G. Ricard and P. Varman, published in the Proceedings of the 15th International Conference on Very Large Databases (1989) pages 135-144, a parallel sort is presented in which multiple independent processors first sort using replacement selection the data stored on their individual disks. Final destinations are selected based on the initial sorted runs and the data is then sent to the final destination and merged. However, there is no cooperation among the processors during this initial phase. So the average run length produced by the initial replacement selection algorithm remains 2 m.
In "System Issues In Parallel Sorting For Database" by B. Iyer and D. Dias, published in Proceedings of the 6th International Conference On Data Engineering (1990) pages 246-255, the parallel sorting algorithm presented by Iyer, Ricard and Varman is studied further. Again, the resulting algorithms assume no cooperation among the processors during the initial sorting phase so the average run length remains unaffected.
In "Parallel Sorting On A Shared Nothing Architecture Using Probabilistic Splitting" by D. DeWitt, J. Naughton and D. Schneider, published in Proceedings of the First International Conference On Parallel and Distributed Information Systems (1991) pages 280-291, a parallel sort based on preliminary sampling is presented. In this approach each processor independently samples its disks and then uses this information to decide the destination processor for each of the records. The data is then sent to the final destination where it is sorted. Again, there is no cooperation among the processors during the sorting phase.
In "Parallel Sorting Methods For Large Data Volumes On A Hypercube Database Computer" by B. Baugsto and J. Greipsland, published in Proceedings of the Sixth International Workshop On Database Machines by Springer-Verlag (1989), pages 127-141, external algorithms for parallel sorting on a hypercube are presented. Their work also uses sampling to determine the initial partitions and does not address the issue of creating initial sorted runs.
In "Tuning A Parallel Database Algorithm On Shared Memory" by G. Graefe and S. Thakkar, in Software--Practical Experience, vol. 22, no. 7 (1992), pages 495-517, a parallel external sorting algorithm is presented. However, their work focuses on a shared memory system and does not address the issue of independent buffers.
In "Sorting By Natural Selection" by W. Frazer and D. Wong, in Communication of the ACM (1972), pages 910-913, a technique is given for increasing the length of the sorted run by storing tuples that could not fit into the current run. However, their work trades off disk I/O during the creation of the sorted run in order to increase the size of the sorted runs as opposed to considering cooperation with external buffers.