The operation of sorting computer data records is a function which requires a substantial amount of the computer's resources. The sorting process, known in the art, generally involves reading into memory all of the records from an input file on an external storage medium, sorting the records in a desired order, and writing the sorted records to an output file. The specific distinct steps utilized are generally referred to as initialization, reading and sorting, merging and writing.
The first step, of initialization, involves parsing user-supplied sorting information and planning a sort strategy and resource allocation. Typically, users supply the names of the input files to be sorted, the name of the resulting output file, and the fields in the input data which are to be used as the basis for the sort, i.e. the key. Next, the initialization routine will determine what resources are available, specifically the amount of memory and of temporary storage space available for use. Based on the resources and the input/output files, the initialization routine will then determine an input/output (I/O) strategy and sort strategy for use on the data. The I/O strategy is based on providing the input file to memory as fast as possible with the least amount of computational work and attendant delay. The sort strategy is based on effectively sorting the data with the given amount of memory and auxiliary storage.
The second step involves reading the data from storage into the computer's memory and sorting the data. Standard input or read commands access the data from the input files located on external storage. A number of different sort algorithms can be implemented depending upon the nature of the data and the resources. Quicksoft, external, heap, radix, tag and bubble sorts are commonly used sort algorithms.
The third step, the merge step, only comes into play if the amount of data to be sorted is larger than the memory can hold. If the number of records exceeds the available memory space, the sort may be conducted in a series of steps whereby less than all of the records are read into the computer's memory at a time. Those records are sorted and stored as a string, or run. After all records have been sorted into strings, the strings are merged into larger strings until all of the data is in the correct order.
The fourth step consists of writing the final sorted data to the designated output file. Standard write commands will be used unless, as in the tag sort instance, the records themselves must be read from the external storage device prior to writing to an output file.
A final cleanup will also be implemented as described in detail below.
Given the resources needed to effect a sort, the prior art is replete with teachings for enhancing the sort process. Improvements to sorting have been directed, in a first instance, to the actual sorting algorithm, such as is found in U.S. Pat. No. 4,809,158 of McCauley, wherein the number of comparisons needed to effect a sort is minimized. A second approach to sorting enhancement is to optimize the input/output (I/O) processing time, given the number of I/O's involved in reading the data into memory, storing strings, reading strings from storage, etc. Examples of I/O improvement strategies can be found in U.S. Pat. No. 4,210,961 of Whirlow, et al, which is specifically drawn to sorting, and U.S. Pat. No. 4,930,065 of McLagan, et al drawn to generic I/O improvement in a computer system. A third approach to optimized sorting is to actually minimize the amount of data to be read, sorted and output. A tag sort or key sort, such as is described in a copending application entitled Modified External Tagsort, Ser. No. 07/812,636, filed Dec. 23, 1991 and assigned to the present assignee, sorts tags, or short identifiers, which represent the presumably larger records to be sorted. Minimizing data in this way results in a reduction in the amount of available memory required for sorting, a decrease in the I/O time and, ideally, fewer sorting and merging steps. The drawback to a tagsort process is that the entire record must be read into the computer's memory initially for the assignment of a tag. The tagged record can then be returned to the external storage device, with its tag remaining in the computer to be operated upon in the sort. Therefore, although the tagging method does facilitate the sorting process, the number of I/O's may be the same and the main computer is still involved.
In accordance with the invention, the storage devices themselves are utilized to facilitate the sorting process. Input files to be sorted are being stored on increasingly "intelligent" peripheral storage means. By exploiting the greater intelligence and processing capabilities of the peripherals, I/O can be improved upon and the amount of data to be sorted can be minimized while utilizing existing sorting algorithms.
It is therefore an objective of the present invention to provide an enhanced sorting strategy.
It is a further objective to provide a sorting method which does not require reading the entire file to be sorted into the main computer.
It is yet another objective of the invention to provide a means for sorting records using less than all of data contained in the records in the sort.
It is a further objective of the invention to utilize the capabilities of smart peripherals to facilitate sorting.
Still another objective is to rely upon the intelligent peripherals to reorder and write the complete records without main processor involvement.