The present invention relates to apparatus for use in a data processing system for sorting a plurality of arbitrary length data records into a desired order.
The layman typically thinks of modern day digital computers as devices for performing extensive numeric calculations, such as are involved in determination of the orbits of artificial satellites, engineering and geophysical calculations, etc. In point of fact, however, an extensive portion of the total amount of available computer time is used, not in computations, per se, but rather in ordering and reording large files of arbitrary data, such as customer list, inventory data, etc. Current users, themselves, estimate that such sorting operations typically account for more than 25% of data processing and computation time in typical installations.
The data to be sorted is generally arranged in a file, consisting of a large number of individual records. These records include portions, which will be referred to hereinafter as keys, used to sort the data. A sorting operation involves the arrangement of all of the records in the file so that the keys of the files on the list are in numerical order. The file may, for example, incorporate data corresponding to a group of individuals, with each record including data for a single individual. In this example, the keys for each record may be the individual's name (in digital form, of course). The ordering of the records by key would then result in the file being arranged so that the individual's names are in alphabetical order.
Sorting techniques may be separated into two categories. In software sorting techniques, the central processor, itself, examines each record and inserts it at the appropriate place in a list of such records. Various programs embodying a number of well-known algorithms have been devised for performing such internal sorting operations (e.g., quick sort, bubble sort, bitonic sort, etc). All such programs are quite costly and consume large blocks of central processor time, in view of the general necessity for the processor to perform only a single operation at any given time. The second category involves the use of a hardware module for performing the sorting function. In operation, the sorting module will first be loaded with a file of records from the computer mass storage, after which the module will be triggered to perform the sorting operation. The sorting module will then sort the file of records in accordance with the keys, after which the sorted files will be returned to mass storage for use by the central processor. A large number of patents have issued directed to various peripheral sorting modules, including the patents to O'Conner et al., U.S. Pat. Nos. 3,029,413 and 3,311,392, Armstrong, U.S. Pat. Nos. 3,273,127; 2,984,822; 2,984,824; 3,013,249; 3,015,089; 3,329,938; 3,329,939; and 3,336,580, as well as the patents to Chen et al., U.S. Pat. No. 4,078,260 and O'Connor, U.S. Pat. No. 3,685,024.
The most common method disclosed in these patents relates to the use of a two-line sorting module, wherein two records are presented serially, the most significant bit first, to the input of the module. The sorting module gates the two records onto output lines so that the record having the higher key appears on a designated "HIGH" output and the record having the lower key appears on a designated "LOW" output. N-line sorters are formed from networks of these two-line sorting modules. In general, each of these networks utilizes two or more cascaded tiers of two-line sorting modules, so that the total amount of time necessary to perform the sorting operation increases with the number of lines into the sorter. Thus, for example, if the delay time required for operation of a two-line sorter module is defined as being one unit delay, then an optimal network for a three-line sorter will require a three unit delay.
Hence, the larger the number of input lines into the sorter network, the more complicated the network becomes, and the greater the amount of time required to complete the sorting function. Often, the size of the file to be sorted is much larger than the number of lines into the sorter network, requiring that the central processor sort groups of this file in iterative fashion. The actual amount of time required to sort a given file may therefore be many times the delays listed above. The number of sorter passes required in order to sort a given file may be reduced by increasing the number of lines into the sorter network, however this increases the delay within the sorter network itself. Clearly, an optimum sorter network will have an arbitrary number of input lines, and a very short delay associated with it.