1. Field of the Invention
The invention relates to a distributed memory-type information processing apparatus, and particularly to an information processing apparatus that is able to perform the sorting, compiling and joining of data at extremely high speeds.
2. Description of the Prior Art
Now that computers have been introduced into many aspects of society in its entirely and the Internet and other networks have become pervasive, data is being accumulated on a large scale. Vast amounts of computing power is required in order to process data on such large scales, so attempts to introduce parallel processing are natural.
Now, parallel processing architectures are divided into “shared memory” types and “distributed memory” types. The former (“shared memory” types) are architectures wherein a plurality of processors shares a single enormous memory space. In this architecture, traffic between the group of processors and the shared memory becomes a bottleneck, so it is not easy to construct practical systems that use more than 100 processors. Accordingly, at the time of calculating the square roots of 1 billion floating-point numbers, for example, processing can be performed no faster than 100 times the speed of a single CPU. Empirically, the upper limit is found to be roughly 30 times.
In the latter (“distributed memory” types), each processor has its own local memory and these are linked to construct a system. With this architecture, it is possible to design a hardware system that incorporates even several hundred to tens of thousands of processors. Accordingly, at the lime of calculating the aforementioned square roots of 1 billion floating-point numbers, processing can be performed several hundred times to tens of thousands of times the speed of a single CPU.
Latent demand for parallel processing implemented by a large number of processors numbering in the range of several hundred or more is said to be large, but as described above, these are difficult to design using architectures other than the distributed memory type when they are to be implemented using current realistic hardware technology.
In distributed memory architectures, the capacity of the memory attached to the individual processors is small, so in the storage and processing of data (typically in arrays) on a large scale which is one of the main objects of parallel processing, it is necessary to divide this data among the plurality of processors and the memory attached to each.
However, when arrays are divided among a plurality of processors and the memory attached to each, bus mastering to prevent the collision of data upon the bus becomes difficult, so if the various processors cannot operate in parallel, then there is a problem in that the efficiency of processor usage cannot be increased and it is not possible to increase the speed of processing. To this end, the present invention achieves various objects as described below.    (1) Collision of data on the bus algorithmically cannot occur so bus mastering is unnecessary; thereby the processing speed can be increased by making full use of the bus bandwidth.    (2) Parallel processing is possible by combining a plurality of memory modules equipped with a processor (or preferably a plurality of processors) and memory, and it is possible for the respective memory modules to be used effectively and processing can be allocated independently to processors within each memory module, and thereby processing speed can be further increased by the effective utilization of memory modules.    (3) If the size of data subject to sorting is N, then a data size of only O(N) is required. (With conventional sorting, a data size of O(N*N) or O(N*Log(N)) is necessary in the worst case.)    (4) The processing time is stable and even in the worst case, a predictable processing speed is guaranteed.
To wit, the present invention has as its object to provide an information processing apparatus that is able to perform the sorting of arrays at extremely high speed and with stable processing times.