1. Field of the Invention
The present invention is directed to a method and apparatus for broadcasting a distributed data set distributed over various processors on a multiprocessor ring so that every processor will contain the complete data set and, more particularly, using a bidirectional communication within the ring of processors which communicates each subset of the distributed data set the minimum distance possible and does not require additional buffer space within each processor.
2. Description of the Related Art
Parallel computers are becoming increasingly popular as a means of achieving high-performance computing especially in image processing applications. A parallel computer is a computer which is composed of more than one independent processor unit. These computers are often divided into different classes which reflect the organization of the processors and memory in the computer. Those computers whose processors share a common memory to which every processor has direct access are called shared-memory parallel computers. Those computers whose processors each have a separate memory which cannot be accessed by any other processor are called distributed-memory parallel computers. While the processors in a distributed-memory computer cannot access the separate memory of another processor, the processors can communicate information with each other by using interprocessor communication links.
These two types of computers each have advantages and disadvantages with respect to different kinds of computing tasks. A shared-memory parallel computer is most effective on tasks which have a relatively low number of memory accesses but whose memory accesses are very unpredictable and can be to any portion of the common memory. The performance of a shared-memory machine is limited by the shared memory access bandwidth. In contrast, a distributed-memory parallel computer is most effective when the memory accesses are very heavy but relatively well organized and segmented. The performance of this type of computer is limited by the link communication bandwidth. The present invention relates to a communication method for distributed-memory parallel computers with at least two bidirectional communication links per node or processor.
Because the performance of distributed-memory parallel computers is limited by the communication bandwidth between the processors, it is important to minimize the inter-processor communication and thus maximize the computer's performance. To minimize the inter-processor communication, one must organize the distribution of data carefully so that each processor has all the data that it needs. Unfortunately, this cannot be done in any general sense. The access of or need for particular data in an algorithm is simply dependent on the algorithm itself. Different algorithms will access data in different ways.
To minimize this problem, programmers for distributed-memory parallel computers will generally choose some compromise for the data distribution which reduces the interprocessor communication for the most common operations on the data set. However, there will always be some operations for which an alternative data distribution is superior.
The most common data distribution scheme for distributed-memory parallel processors is one in which each processor has an equal amount of data and no data is stored in more than one processor. The data are grouped in such a way as to minimize any interprocessor communication. Nevertheless, it is also fairly common for a computing task to be more efficiently executed if the complete data set is stored in every processor.
For this kind of task there are two alternative communicating methods which are commonly used. The first is to have each processor request only the needed data from whichever processor has the needed data by sending messages through the communication links. The second is to simply broadcast the data from every processor to every other processor regardless of whether all the data is needed. Once the data has arrived, each processor can continue its task by accessing whatever data from the entire set the processor needs. At this point, each processor will contain a complete data set.
The first alternative will minimize the communication of information, but at the expense of a less organized program and with additional messaging overhead required to support the request/reply cycle. The second alternative may move more data but does it in a very organized and optimizable way. The preferred alternative will once again depend on the application program requirements.
If a broadcast of all the data from every processor is chosen, it is imperative that the implementation of this task be as efficient as possible.