1. Field of the Invention
The present invention relates generally to two-phase external selection and merge record key sorting procedures, and more particularly, to a method for minimizing line-accessed cache misses during an external tournament tree replacement sorting procedure.
2. Discussion of the Related Art
The tournament sort is so-called because it matches items against each other just as people are matched against each other in setting up a tennis tournament. The original list is divided in pairs. The winner of each pair is determined and this set becomes the first auxiliary list. First-stage winners are paired and compared to identify second-stage winners. The second-stage winner's list, is paired, in turn, and so forth. The final pair of winners is matched to determine a final winner. The winner in a tournament sort is the lesser-valued key. Thus, the least key in the tournament selection tree is the first key on the sorted output list.
A systematic procedure for such a tournament sort is well-known in the art. For instance, refer to Ivan Flores, "Computer Sorting", pp. 70-84, Prentiss-Hall, Inc., Inglewood Cliffs, N.J. (1969). The classical selection tree or tournament sort removes each "winner" in turn to an output list, and repeats the selection procedure for the keys remaining in the selection tree. The sort is ended when all keys contained in the original tree nodes are exhausted. Thus, a selection tree having P nodes occupies P memory locations and generates an output sort string or "run" of length P. For data sets too large to fit into P memory locations, a series of tournament sorts is employed to create a series of output runs of length P, which are then merged in some manner.
The replacement tournament sort improves on the classical tournament sort by replacing the "winner" with a new key from the list of keys to be sorted. Thus, the contents of the P-node selection tree in memory are perpetually replenished. The fundamental problem with the replacement tournament sort is the management of new replacement keys that are less than the "winner" previously written to the output run.
One method for managing such new keys is to label them as ineligible for participation in the present run. Thus, over time, the selection tree in memory fills with ineligible keys. When all eligible keys are exhausted, the current output run is ended and a new run is begun merely by globally qualifying all previously ineligible keys. The new output run then builds as before, with the replacement tree again slowly filling with new ineligible keys. This process continues indefinitely (as explained in detail by Flores at pp. 121-128), creating a group of sorted output runs that are then merged. Although similar in effect to the nonreplacement tournament sort, this replacement tournament sort method creates longer output runs, with an average length of 2P sorted keys. Refer to Knuth, "Art of Computer Programming", Volume 3/Sorting and Searching, pp. 247-263, Addison-Wesley Publ. Co., Menlow Park, Calif. (1973), for a discussion of this run length enhancement effect for replacement tournament sorts.
A suitable method for labelling ineligible keys is to add an output run number as the most significant element of the key. In such a scheme, all incoming keys are immediately augmented by the addition of the current run number to the most significant key position. The run number remains in place until the key "wins" and can be stripped when the key is written to the sorted output run tables outside of memory. When the replacement key entering memory is tested and found to be less than the last selected key written to the sorted output run, the replacement key is made "ineligible" simply by incrementing the current run number by one before augmenting the ineligible key. Thus, it is appreciated in the art that such run number manipulation is sufficient to ensure that a replacement key is never less than the selected key being replaced. After completion of the initial phase of such a replacement tournament selection sort, several sorted runs having an average length of 2P keys are available in output storage for the second merging phase to be conducted in any suitable manner known in the art.
In U.S. Pat. No. 3,713,107, Harut Barsamian discloses a systematic architecture for the electronic implementation of a tournament sorting procedure that uses the main computer memory for storage of the selection tree nodes. Barsamian discusses the above-described replacement tournament tree sort but does not consider the efficiency problems arising from the addition of cache buffer memory to the processing system.
In U.S. Pat. No. 4,962,451, Douglas R. Case, et al, disclose a new use of a LRU-managed CPU data cache for generation of sorted key runs. Case, et al teach a method for improved caching efficiency that keeps the cached sub-tree size small enough to avoid triggering the cache LRU discipline for moving new lines into cache. The means and method disclosed by Case, et al replaces the prior art taught by Barsamian and thereby obtains a processing speed advantage.
Other practitioners in the art have more recently attempted to improve caching efficiency to enhance replacement tournament sorting procedures. This is a keenly-felt problem in the art, especially for external replacement selection tree sorts involving large numbers of records and limited cache space.
As stored in cache or main memory, the tournament tree nodes normally contain pointers to "losing" sort keys remaining in the tournament. When a "winner" is identified at the root node of the selection tree, the key referenced by the root node pointer is written to the sorted output run and the corresponding leaf node is replaced with a new key pointer from the external list to be sorted. The pointers in the parent and ancestor nodes of the updated leaf node must then be changed to reflect a new winners. The winner changes propagate upwards along the ancestor node path to the root node in a manner well-known in the art. The only nodes subject to change in the selection tree are those nodes on the updating path from the replaced leaf node to the root node of the tree.
Comparison at each node in the updating path requires access to the actual values of the sort keys referenced by the node pointers. Because it is not usually desirable to fit all such sort keys into the cache memory (because of size constraints), this pointer technique entails a large number of CPU cache misses.
Such cache misses can be reduced by either adding the bulky sort keys to the nodes in cache or through the use of less bulky offset value codes. But, neither technique avoids the cache misses resulting from the displacement of the parent node on the updating path from the child node at the lower levels. Examination of FIG. 2, showing a typical 63 node selection tree, demonstrates that the displacement of any parent node(i) is floor(i/2). That is, for large trees, the parent node is about half-way from the root node to the child node in terms of code word storage sequence. Hence, except for the root and the root's immediate children and grandchildren, the parent node is located in a different cache line from the child node for all nodes in the selection tree. If the tree is too large for main memory, almost every parent access will be a cache miss.
The previous solution known in the art and discussed above is to select in a first phase with the maximum tree of P nodes that fits into CPU cache and to invoke one or more merge passes in a second phase. This second merging operation doubles the CPU cycle cost of such sorting procedures, a deplorable overhead penalty that is required only because the cache storage size of P is insufficient for the tree sorted. This is a well-known problem in the art, as will be appreciated when referring to the Knuth reference wherein extensive attention is given to increasing output run lengths from trees of fixed size.
The related unresolved problems and deficiencies are clearly felt in the art and are solved by the present invention in the manner described below.