1. Field of the Invention
The present invention relates to the field of sorting and computer methods and apparatus for sorting.
2. Description of the Related Art
Sorting is important to the computer field in many applications. For example, enterprises and other users often wish to sort list of employees, customers, fries, inventory, data or other categories of information. The sort order can be by any criteria including alphabetical and numerical in ascending or descending order for one or more "fields" within the records. Because sorting is important in the computer field many methods and apparatus for sorting have been devised. Computer sorting is described in many works including, Sorting and Searching, The Art Of Computer Programming, Vol. 3 by D. E. Knuth, and Sorting and Sort Systems by H. Lorin and in Programs=Algorithms+Data Structures by N. Wirth.
The efficiency with which sort algorithms perform in computers is, in part, a function of the architecture of computer systems. Computer systems generally have an architecture which includes processing units (central processing units-CPU's) and a storage system which includes internal storage and external storage. The internal storage may include cache, primary and secondary random access memory. The external storage includes I/O units such as magnetic or optical disc or magnetic tape drives.
Information to be sorted is organized into records. The task of sorting causes all the records to be sorted to be accessed in the storage system. The processing units as part of the sort processing fetch the records from the storage system and store the records back into the storage system with an ordering determined by the sort algorithm.
For purposes of sorting, sorting algorithms have been characterized as being internal or external. An internal sort is one in which the records to be sorted are resident within internal storage of the computer system. An external sort is one in which the records to be sorted are not fully resident within internal storage and hence are stored in external storage during the sort processing.
When the number of records to be sorted is small so that the records can all be contained concurrently in the internal storage, then generally an internal sort algorithm is preferred since it performs more quickly than an external sort algorithm since, in an internal sort, time need not be wasted in accessing records from the slower external storage.
When the records to be sorted cannot be contained entirely within the internal storage system, then sorting algorithms tend to be slowed down to account for the additional time required to access records from the slower external storage. Sorting a large number of data records with an external sort usually requires a two stage process. The first stage is the initial sorting of as many data records as can be accommodated by the faster internal storage to create strings of sorted records. The second stage involves merging the strings of sorted records, with accesses to the external storage, into a final sorted sequence of records.
For external sorts, sorting algorithms have been designed to accommodate different I/O computer architectures. For example, when a computer system is limited by the access time required for accesses to disk storage, sorting algorithms which minimize the number of external storage accesses enhance the efficiency of the sort algorithm.
For internal sorts, the literature has assumed a flat random access machine (RAM) model for internal storage in which all memory accesses to internal storage are of identical computational cost. With this assumption, the literature has concluded that algorithms such as quicksort, heapsort and tournament sort provide the greatest efficiency for internal sorts. The latter two algorithms are frequently employed as replacement selection techniques for the internal sorts of external sort strings. Tournament sorts are described in the standard textbooks such as Knuth, D. E. "Sorting and Searching", The Art of Computer Programming, Vol. 3 and Lorin, H. Sorting and Sort Systems.
Contrary to the assumption of the literature, however, the flat random access machine (RAM) model is not accurate for large computers (such as main frames) nor is it accurate for even smaller computers (such as storage system servers). As architectures of computer systems have evolved, many improvements have been developed for enhancing the speed with which internal storage operates.
The internal storage system today is typically hierarchical including a plurality of different internal storage units of different speeds and designs. Typically, the internal storage system includes cache units which operate at high speed (for example, the same speed as that of the processing units), includes primary units (main store units) which operate at slower speeds than the cache units, and may include secondary units which operate at even slower speeds than the primary units. In modem storage systems, the cache units are at times organized into a plurality of cache subunits where each cache subunit may be of differing capacity and speed. Similarly, primary storage units and secondary storage units within the internal storage can be formed of multiple units, with speeds varying for different patterns and volumes of access. In addition, virtual-to-real storage address translation can cause access delays in storage systems that employ virtual addressing.
In accordance with the above background, there is a need for improved sorting methods and apparatus in computers which are particularly adaptable and efficient for sorting in computers having hierarchical storage units in the storage system.