1. Field of the Invention
The present invention is directed to a system for exchanging data between processors or computers operating in parallel and connected in an array and, more particularly, to a system in which local two dimensional data sets or tiles in each processor are augmented by some portion of the two dimensional tiles of the neighboring processors in a first exchange in which tiles on the right and left, and above and below are exchanged followed by a second exchange which causes diagonally positioned tile data to be received and stored by each processor.
2. Description of the Related Art
Distributed-memory parallel computers are an efficient solution for many computing problems. These computers utilize multiple processors, each of which have a separate, local memory which is not accessible to any other processor. The processors can communicate with each other through interprocessor communication links. The number of processors in such a computer can often be extended without limit. However, the performance of such a computer is limited by the effort of communicating information between the processors so that, at some point, the performance of a large computer with many processors will be constrained by the interprocessor communication bandwidth.
Such computers are most efficient when computing on data sets which are distributed over the various processors and which do not require much communication. Mesh or toroidally connected parallel computers are especially effective when processing two-dimensional data sets, such as images. A typical image processing operation is image convolution. Typically, each data set is divided into mutually exclusive, two-dimensional subsets, each of which is stored in one processor. The two dimensional data subsets, or tiles, are allocated to the processors in such a way as to place contiguous tiles into contiguous processors.
Since a mesh or toroidally connected parallel computer is locally connected, this arrangement of data is very cost effective for operations which are local in nature. Point operations, of course, do not require any interprocessor communication since each data element is processed alone. Operations which utilize neighborhoods, such as convolutions, are more complex. Because each processor stores a mutually exclusive data set, part of the neighborhood for data elements on the edge of the local tile is stored in a neighboring processor. These neighboring data elements must be exchanged between processors to perform the neighborhood operation. Since these data exchanges represent overhead in the parallel computer, it is important to optimize the interprocessor communication necessary to implement the data exchange.
One convenient method for providing the necessary data to support neighborhood operations is to augment the local tiled data subsets with data normally stored in neighboring processor tiles. It is only necessary to exchange data between processors once to augment the data tiles. This is an especially effective approach because the data for all edge neighborhoods are available in the augmented tile and precludes the need for multiple communications. The augmented data must be stored in more than one processor, so that the penalty for reduced communication is increased data storage. However, if the tiled data itself is changed as a result of some operation, and a successive neighborhood operation is to be performed with the same data, the augmentation must be repeated so that the augmented data is changed as well.
It is also important to maintain a consistent data storage structure for the augmented data. Two dimensional data is typically stored in a two dimensional data structure; the augmented data should be placed within this data structure so as to reduce any processor overhead in accessing the data.