1. Field of the Invention
The present invention relates to a sorting process to be executed by a computer system, and more particularly to a sorting process to be executed by a plurality of computers.
2. Description of the Related Art
A sorting process is one of fundamental types of information processing to be executed by a computer system. A most common sorting process is to rearrange a plurality of records each including one or a plurality of keys, in a certain order (e.g., ascending or descending order) in accordance with the keys.
In a sorting process for a relatively small amount of input data (collection of records), a computer writes given input data in a buffer of a computer memory, refers to and compares keys of respective records to rearrange the records and obtain a sorted result. Such a sorting process which uses only the buffer of a computer memory as data storage region is called internal sorting.
In another sorting process for a relatively large amount of input data as compared to the capacity of a buffer, since all the input data cannot be written in the buffer, the computer divides the input data and writes each division data into the buffer to perform internal sorting and obtain the sorted result (hereinafter called a sorted string) which is stored in an external storage. Such input data write, internal sorting and sorted string storage are repeated as many times as the number of input data divisions. As a result, sorted strings corresponding in number to the number of data divisions are stored in the external storage.
Next, the computer reads each portion of all the sorted strings from the external storage and writes the portions in the buffer. The computer compares the keys of the records stored in the buffer to obtain a sorted result of each portion of all the sorted strings. Such a process for each portion of all the sorted strings is executed from the start to end of each sorted string to thereby obtain a final sorted result of all the input data. Such a process of obtaining the final sorted result from each sorted result is called merging. Such a sorting process using the computer buffer and external storage as data storage region is called external sorting. The external storage may be a magnetic disk, a magnetic tape or the like.
Parallel processing is one type of high speed information processing to be executed by a computer system. Parallel processing shortens a process time by preparing a plurality of resources necessary for information processing and performing a plurality of tasks at the same time, thereby attempting to reduce the processing time. As an example of a parallelized external sorting process, JP-A-8-272545 discloses techniques of improving parallel processing by using a disk array constituted of a plurality of magnetic disks as an external storage and devising the storage locations of data in the disk array.
In another type of a parallelized external sorting process, a plurality of computers may be interconnected by a network to constitute one computer system and a large amount of input data may be sorted in parallel by a plurality of computers. More specifically, input data is distributed and assigned to some or all computers (hereinafter called input nodes) among a plurality of computers, and each input node externally sorts the assigned input data. The externally sorted result at each input node is transferred via the network to one computer (hereinafter called an output node) among a plurality of computers. The output node merges the sorted results transferred from the input nodes in a manner similar to dealing with externally sorted strings, to thereby obtain the final sorted result of all the input data.
However, while the sorting process is executed in parallel by a computer system constituted of a plurality of computers, the merging process is required to be executed both at the input nodes and output node. Therefore, a time required for the merging process at an input node is added to a time required for the sorting process at the input node. The merging process at an input node may delay or hinder another task at the input node.
If a network interconnecting computers cannot operate at high speed, a time to transfer, via such a network, the externally sorted result at an input node to the output node may pose some problem. For example, if a network is a LAN having a transfer speed of 100 M bits/sec such as 100 Base-T, the data transfer speed is about 12.5 M bytes/sec or lower. It takes therefore about a threefold transfer time, as compared to about 40 M bytes/sec or lower of an ultra wide SCSI which is commonly used I/O specifications. If a plurality of computers are interconnected by a network operating at not so high a speed, a ratio of the data transfer time to the sorting process time may become large.
It is an object of the present invention to provide a sorting system and method capable of executing parallel sorting processes at a plurality of computers in high speed by shortening the sorting process time.
A first sorting system of this invention comprises: a plurality of input nodes; one output node; and a shared external storage unit connected between each of the input units and the output unit. Each of the input units comprises: a first buffer; means for storing sorting target data in the first/buffer; means for internally sorting data in the first buffer in accordance with a predetermined sorting rule; and means for storing as a sorted string an internally sorted result in the shared external storage unit. The output node comprises: a second buffer; means for storing, in the second buffer, the sorted string stored in the shared external storage unit; means for merging a plurality of sorted strings stored in the second buffer in accordance with the predetermined sorting rule; and means for outputting a merged result as a sorted result.
A second sorting system of the invention comprises: a plurality of input nodes; a plurality of output nodes; and a shared external storage unit connected between each of the input units and each of the output units. Each of the input units comprises: a first buffer; means for storing sorting target data in the first buffer; means for internally sorting data in the first buffer in accordance with a predetermined sorting rule; means for classifying data in the first buffer in accordance with a predetermined classification rule; and means for storing as a sorted string an internally sorted result in the shared external storage unit in accordance with the predetermined classification rule. The output node comprises: a second buffer; means for storing, in the second buffer, the sorted string stored in the shared external storage unit; means for merging a plurality of sorted strings stored in the second buffer in accordance with the predetermined sorting rule; and means for outputting a merged result as a sorted result.
In the first and second sorting systems, the input node may further comprise means for notifying a merge instruction to the output node, and the output node may further comprise means for receiving the merge instruction.
A third sorting system of the invention comprises: a plurality of nodes; a dedicated external storage unit provided for each of the plurality of nodes; and a shared external storage unit provided between each pair of nodes of the plurality of nodes. At least one node among the plurality of nodes comprises: a first buffer; means for storing sorting target data in the first buffer; means for internally sorting data in the first buffer in accordance with a predetermined sorting rule; means for sorting as a sorted string an internally sorted result in the dedicated external storage unit; a second buffer; means for storing, in the second buffer, the sorted string stored in the shared external storage unit and the dedicated external storage unit; means for merging a plurality of sorted strings stored in the second buffer in accordance with the predetermined sorting rule; and means for outputting a merged result as a sorted result.
A fourth sorting system of the invention comprises: a plurality of nodes; a dedicated external storage unit provided for each of the plurality of nodes; and a shared external storage unit provided between each pair of nodes of the plurality of nodes. At least one node among the plurality of nodes comprises: a first buffer; means for storing sorting target data in the first buffer; means for internally sorting data in the first buffer in accordance with a predetermined sorting rule; means for classifying data in the first buffer in accordance with a predetermined classification rule; means for sorting as a sorted string an internally sorted result in the dedicated external storage unit in accordance with the predetermined classification rule; a second buffer, means for storing, in the second buffer, the sorted string stored in the shared external storage unit and the dedicated external storage unit; means for merging a plurality of sorted strings stored in the second buffer in accordance with the predetermined sorting rule; and means for outputting a merged result as a sorted result.
In the third and fourth sorting systems, the first and second buffers may be the same buffer. At least one of the plurality of nodes may further comprise means for notifying another output node of a merge instruction and means for receiving the merge instruction from the other node.
A first sorting method of the invention for a sorting system having a plurality of input nodes, one output node, and a shared external storage unit connected between each of the input units and the output unit, comprises the steps of: storing sorting target data in the input node and internally sorting the sorting target data in accordance with a predetermined sorting rule; storing as a sorted string an internally sorted result in the shared external storage unit; and reading the sorted string from the shared external storage unit, storing the sorted string in the output node, and merging the sorted string in accordance with the predetermined sorting rule.
A second sorting method of the invention for a sorting system having a plurality of input nodes, a plurality of output nodes, and a shared external storage unit connected between each of the input units and each of the output units, comprises the steps of: storing sorting target data in each of the input nodes; classifying and internally sorting the read sorting target data in accordance with a predetermined classification rule and a predetermined sorting rule; distributing and storing a classified and sorted result in the shared external storage units shared with the output nodes in accordance with the predetermined classification rule; performing a process from the storing step to the distributing and storing step for all sorting target data; and thereafter reading the sorted string from the shared external storage units shared with the input nodes and storing the sorted string in each output node, and merging the sorted string in accordance with the predetermined sorting rule.
A third sorting method of the invention for a sorting system having a plurality of nodes, a dedicated external storage unit provided for each of the plurality of nodes, and a shared external storage unit provided between each pair of nodes of the plurality of nodes, comprises the steps of: storing sorting target data in each of the input nodes and internally sorting the sorting target data in accordance with a predetermined sorting rule; storing, as a sorted string, an internally sorted result in the dedicated or shared external storage unit; performing the internal sorting step and the sorted string storing step for all sorting target data; and thereafter reading each sorted string from the dedicated and shared external storage units at one of the plurality of nodes and merging each sorted string in accordance with the predetermined sorting rule.
A fourth sorting method of the invention for a sorting system having a plurality of nodes, a dedicated external storage unit provided for each of the plurality of nodes, and a shared external storage unit provided between each pair of nodes of the plurality of nodes, comprises the steps of: storing sorting target data in at least one of the plurality of nodes; classifying and internally sorting the read sorting target data in accordance with a predetermined classification rule and a predetermined sorting rule; distributing and storing as a sorted string a classified and internally stored result in the dedicated or shared external storage unit in accordance with the classification rule; performing a process from the storing step to the distributing and storing step for all sorting target data; and thereafter reading each sorted string from the dedicated and shared external storage units at one of the plurality of nodes and merging each sorted string in accordance with the predetermined sorting rule.
In the third and fourth sorting methods, at least one node may have the same buffer to be used for storing the sorting storage data and storing the sorted string.
The input node, output node, and other nodes correspond, for example, to general computers. The dedicated and shared external storage units correspond, for example, to a magnetic disk drive. The first and second buffers and other buffers may use, for example, a portion of a main storage of each computer.