1. Field of the Invention
The present invention is directed to a method and apparatus for redistributing multiple distributed data sets in a multiprocessor ring so that every processor will contain one unique, complete data set comprised from a group of data subsets where each subset is a part of the corresponding data set and is originally distributed among the processors and, more particularly, a method and apparatus which consolidates distributed data subsets into a single data set by using bidirectional communication within the ring of processors to transfer each subset of the distributed data set the minimum distance necessary and which requires a minimum of additional buffer space within each processor.
2. Description of the Related Art
Parallel computers are becoming increasingly popular as a means of achieving high-performance computing. A parallel computer is a computer which is composed of more than one independent processor unit. These computers are often divided into different classes which reflect the organization of the processors and memory in the computer. Those computers whose processors share a common memory to which every processor has direct access are called shared-memory parallel computers. Those computers whose processors each have a separate memory which cannot be accessed by any other processor are called distributed-memory parallel computers. While the processors in a distributed-memory computer cannot access each other's memory, they can communicate information with each other by sending and receiving messages over their interprocessor communication links or by a shared communication channel.
These two types of computers each have advantages and disadvantages with respect to different kinds of computing tasks. A shared-memory parallel computer is most effective on tasks which have a relatively low number of memory accesses but whose memory accesses are very unpredictable and can be to any portion of the common memory. The performance of a shared-memory machine is limited by the shared memory access bandwidth. In contrast, a distributed-memory parallel computer is most effective when the memory accesses are very heavy but relatively well-organized and segmented. Distributed-memory parallel computer performance is limited by the link communication bandwidth. The present invention relates to a communication method for distributed-memory parallel computers with at least two bidirectional communication lines per node.
Because the performance of distributed-memory parallel computers is limited by the communication bandwidth between the processors, it is important to minimize the inter-processor communication and thus maximize the computer's performance. To minimize the inter-processor communication, one must organize the distribution of data carefully so that each processor has all the data that the processor needs. Unfortunately, this cannot be done in any general sense. The access of data in an algorithm is simply dependent on the algorithm itself. Different algorithms will access data in different ways.
To minimize this problem, programmers will generally choose some compromise for their data distribution which reduces the interprocessor communication for the most common operations on their data set. However, there will always be some operations for which an alternative data distribution is superior.
The most common data distribution scheme for distributed-memory parallel processors is one in which each processor has an equal amount of data and no data is stored in more than one processor. The data are grouped within the processors in such a way as to minimize any interprocessor communication. Nevertheless, it is also fairly common for computing tasks to be more efficiently executed if a complete data set is stored in each processor.
For this kind of task there are two alternative communicating methods which are commonly used. The first is to have each processor request only the needed data from whichever processor has the data by sending messages through the communication links. The second is to simply redistribute or consolidate the distributed data subsets so that each processor has a different, complete data set. Each processor can then process one complete data set by itself. This is most efficient when there are at least as many distributed data sets as there are processors in the ring.
The first alternative will communicate only the necessary data but in a random way and with additional messaging required to support the request/reply cycle. The second alternative may move more data but does it in a very organized and optimizable way which is much more likely to saturate the available link bandwidth. The preferred alternative will again depend on the application program requirements.
If a redistribution of all the data from every processor is chosen, it is imperative that the implementation of this task be as efficient as possible.