In this age as computers have been introduced throughout society, and networks beginning with the Internet have permeated everywhere, large amounts of data have been accumulated here and there. Massive calculations are necessary for processing such large-scale data, and it is only natural to attempt to introduce parallel processing for this purpose.
Parallel processing architecture can be largely divided into “shared memory type” and “distributed memory type”. The former (“shared memory type”) is a method wherein multiple processors share a massive memory space. With this method, the traffic between processor group and shared memory become bottlenecked, so building a realistic system using more than one hundred processors is not simple. Accordingly, in the event of calculating the square root of one billion floating-point variables, the acceleration rate for a single CPU is 100 times at best. Empirically, around 30 times is the upper limit.
Within the latter (“distributed memory type”), each processor has a local memory, and a system is built by linking these. With this method, a hardware system can be designed wherein several hundred to several tens of thousands of processors can be built in. Accordingly, the acceleration rate for a single CPU mentioned above in the event of calculating the square root of one billion floating-point variables can be several hundred to several tens of thousands. However, even with the latter, there are several problems which will be described later.
Patent Document 1: International Publication WO00/10103 (FIG. 3 and FIG. 4)
Patent Document 2: International Publication WO2004/092948