Sorting is generally acknowledged to be one of the most extensive computations for which computers are used. The number of papers and books which have been published in recent years on the subject is a clear attestation of that fact. For example, in the Art of Computer Programming, Knuth, Volume III, Page 3, it is stated that computer manufacturers estimate that over twenty-five percent of the running time on their computers is spent in sorting. A reason for this is that rearrangement of data within conventional computers is an essentially cumbersome task because conventional computers are generally limited to performing complicated permutations by sequences of transpositions of data.
Sorting systems are known in the prior art, however, which are capable of responding to a series of binary data records, and which are capable of sorting the records into an ascending or descending sequence as determined by an identifying key assigned to each record. Systems of this general type are described, for example, in U.S. Pat. Nos. 3,329,939 and 3,399,383 which issued in the name of the present inventor. The sorting systems described in these patents are capable of rearranging the data records into a desired sequence, and they are intended to be used in conjunction with general purpose computers.
A primary objective of the present invention is to provide a sorting system which is capable of rearranging the data records in a more rapid and more economical basis than the prior art sorting systems, and on a much more rapid and much more economical basis than the general purpose computers themselves. Specifically, the sorting system of the present invention is intended to relieve the general purpose computer of one of its most frequent and time consuming operations.
It is well known that to sort n records by comparisons requires on the order of n log.sub.2 n comparisons. It is an objective of the system of the present invention to make on the order of log.sub.2 n of these comparisons concurrently in each unit of time, thereby reducing the time requirements to the order of n, as compared with the prior art systems. In addition, the system of the invention serves to incorporate this algorithm into a standard random-access memory system with a minimal increment in complexity and cost.
The data records which are sorted in the system of the invention are multi-bit binary records, each of which includes a first portion representing the key which identifies the record and which is read first in a serial machine, a data portion which is read next, and an address portion which is read last. A sort is accomplished in the system of the invention when the identifying keys of a series of data records read out of the system represent a monotonic sequence of numbers, which may be increasing or decreasing, and which are increasing in the embodiment to be described. Although the records must be of equal length for any particular sort operation, the system itself is capable of handling records of different lengths up to the capabilities of its individual memory cells.
The sorting system of the invention has but two modes of operation, an input mode and an output mode, as mentioned above. During the input mode the system accepts records serially in a random sequence insofar as their identifying keys are concerned, and it stores the records in its random access memory. During the output mode, the records are retrieved from the random access memory on an instantaneous basis, and are produced serially in a sorted sequence insofar as their identifying keys are concerned. As stated above, the actual sorting of the records in the system is achieved immediately, and the only delays encountered are the times required to feed the records into the system during the input mode, and to retrieve the records out of the system during the output mode.
In explaining the algorithm involved in the sorting system of the invention, it may be assumed that the data records are arranged into k levels, and that each record in each level, except those in the highest level, specifies a pair of records in the next higher level.
In operation of the system when a record is passed to the system it first passes to a comparator at the lowest level. The record which is the largest of the input record and the records in the lowest level with which it is compared is then passed to a comparator at the next higher level. Generally when a record is passed to a level k it will be compared with records at that level and the largest of the records with which it is compared and the record itself will be displaced to the next higher level k+1. Records which are not displaced but which are subject to comparison at a level k will be retained for storage at that level k.
If a record is displaced from a level k to a level k+1, it will clearly be preceded by the records with which it was compared in level k. Thus a record in level k may have appended to it the address in level K+1 at which may be found a record which succeeds it in the sorted sequence.
When all of the records have been placed in the store which are to be sorted, they may be withdrawn in an ordered sequence. A blank record may be transmitted to the highest level, thus displacing a data record or a blank to the next lower level, which in turn will displace a record or blank until finally a record is displaced from the lowest level. When a record is displaced from a level, the address of the first record of the collection of records which succeed it in the next higher level is transmitted to that higher level and the address which contained the record is appended to it as it passes to the next lower level. During this part of the procedure, a record greater than a record with which it is compared will displace the smaller record from a module.
In practice, the sets of records to be compared at each level are kept to a total of three records, including the one shifted from the previous level. This is achieved by arranging the records in pairs so that the lower level contains one pair, the second level from the lower level contains two pairs, and in general the nth level from the lower level contains 2.sup.n-1 pairs. Therefore, in k levels, 2.sup.k+1 -2 records may be stored. Each record in a given level serves to specify exactly one of the pairs in the next higher level, and at each stage of the operation at most one pair in each level lacks such a predecessor.
It should be understood that in place of pairs of records, a machine may be constructed to accommodate clusters of many records. In such a machine, each record R contained in a random access store at level k would contain the address of the least record, or first record, of a cluster C in the random access store at level k+1 -- if level k is not the highest level. Each member of the cluster C would follow the record R in the final output sorted sequence.
As used here, a cluster would then be a collection of records, the first or least of which is an identified address. Again, the least record in a collection will be the first record of that collection in the sorted sequence.
During the input mode, each record will specify the corresponding pair at the next higher level. That is, the jth record at one level will specify the jth pair at the next higher level. A selection of records from each level is said to be a chain, if for each record in the chain, the record from the next higher level that is in the chain is in its designated pair. Just prior to the input operation, the system is filled with unity blanks whose indices have the highest possible values (all1's). The system is filled in such a way that each record precedes the members of the pair that it specifies.
Whenever a new record of a chain is entered into the system during the input operation, it is compared with the members of the chain already in the system, beginning at the lowest level and working up until the one it follows is found. The succeeding members in the chain are moved up the chain one-by-one, and the last one is moved out of the system. During each input interval during which a new record is shifted into the system, a zero blank (all 0's) is also shifted into the system. During the input operation, the records so moved out of the system will be the zero blanks referred to above, together with the blanks (all 1's) which are displaced by data records.
During the output operation, there is no need to wait until the first record has been extracted to begin comparisons for the second record. In fact, as soon as a record is shifted from the third to the second level, it will be known that the records in the third level are without a specified predecessor, that is those specified by the shifted record, and comparisons can begin to be made at the third level again. In general, as soon as the comparisons at the nth level for the jth cycle are completed, the comparisons at the (n+1) level for the (j+1) cycle can begin. Since these comparisons are accomplished serially on a bit-by-bit basis, comparisons can begin again at the nth level as soon as the first character of the record shifted from the level n+1 is received.
A mechanism is provided in the system of the invention for addressing the individual banks of random access memory. As noted previously, the records to be compared at a given level are the record being shifted from the next lower level, and the pair that was just left without a specified predecessor because of the shifting of a record out of the level below. The last character of each record will represent a starting address of the pair it specifies. As a record is shifted from one level to the next this address is transmitted to the address register of the module above, and it is replaced with the current starting address from the current level. Upon receiving this address the memory module performs a read followed by a write beginning at this address for a number of loations determined by the record length.