The present invention relates to sorting techniques and more particularly, but not by way of limitation, to a system and methods for sorting data having variable length sort key fields.
The process of sorting an object generally involves (1) reading the object from external storage, often referred to as “unloading” the object, (2) passing the object's data to a utility that reorders/sorts the data in accordance with a specified sort key and (3) writing the sorted data back to the object's file on external storage, often referred to “reloading” the object. It will further be understood that all sort routines compare items of the same size. That is, the size of an object's sort key must be constant from record to record during the sort operation. As used herein, the term ‘object’ refers to any collection of data that may be sorted. For example, an array or list of elements, one or more tables in a relational database or a collection of records within a conventional file structure. One of ordinary skill in the art will understand that an object typically includes one or more records, that records are comprised of one or more fields, that one or more fields are designated as a sort key and that sorting reorders an object's records based on the value of the records' sort keys.
One prior art technique for sorting objects having variable length keys is shown in FIG. 1. Sort routine 100 reads and pads a record from the object being sorted (block 105). The act of padding converts variable length key fields to fixed length key fields of a size great enough to accommodate any value that the key may assume. Once padded, the record is written to an intermediate file (block 110). If there are additional records to pad (the ‘NO’ prong of block 115), processing continues at block 105. If the object has been completely unloaded (the ‘YES’ prong of block 115), a sort utility is invoked that reorders and then stores the padded records in a result file (block 120). Each sorted and padded record is then retrieved from the result file, unpadded and reloaded into the object (blocks 125, 130 and 135). The act of unpadding adjusts the size of each record's sort key to its original size. Using pad and unpad processes can be very intensive to intermediate storage since a padded copy of the entire object (possibly an entire database) needs to be created. Given that many keys are a fraction of the size that must be supported; the padded copy of an object can be several times the size of the original. In addition, since a single object can not generally be retained in working memory, the time required to write and read an intermediate file having expanded sort keys can consume a significant portion of the total time needed to sort the object (e.g., the total elapsed time from block 105 to block 135).
To mitigate some of the aforementioned drawbacks, certain commercially available sort applications support input and output routine customization. These user-created, application specific, programs are commonly referred to as E15 (input) and E35 (output) programs. As illustrated in FIG. 2, after obtaining a record from (unsorted) object 200, E15 program 205 pads the record's sort key and passes the padded record to E15 interface 210. Sort routine 215 accepts input from E15 interface 210, sorts the records in accordance with the specified key and manages the transfer of padded data to and from intermediate storage 220. Following completion of the sort operation, sort routine 215 passes the sorted and padded records through E35 interface 225 to E35 program 230 which then unpads the records and reloads object 200.
One significant drawback to prior art sorting techniques is that for large objects (e.g., databases and/or large database objects) comprising tens of megabytes to tens of terabytes, the time required to transfer padded data to and from intermediate storage can comprise a significant portion of the total time required to unload, sort and reload the target object. In addition, the amount of intermediate storage needed to retain padded data can be a significant use of resources. Thus, it would be beneficial to provide techniques (methods and devices) to sort data that is more time and resource efficient than current techniques.