To increase processing speed, computer systems have been developed including a plurality of processors operating generally concurrently on one or several programs. Various types of such multiple-processor computer systems have been developed. In one type of system, generally identified as "SIMD" (single instruction stream/multiple data stream), all of the processors are controlled in tandem, and generally in lockstep, by a host computer. A typical SIMD system is described in U.S. Pat. No. 4,598,400, issued Jul. 1, 1986, to W. Daniel Hillis, for Method and Apparatus For Routing Message Packets (hereinafter the "400 patent") and assigned to the assignee of the present application. In the system described in the '400 patent, the host computer broadcasts instructions to the processors, which execute them in parallel on their separate streams of data. The various processors in the system described in the '400 patent are interconnected by a routing network, which permits them to transfer messages, including data, also in parallel under control of the host computer. In a typical SIMD system, the processors operate in synchronism, with the synchronization of the processors being maintained through the issuance of the instructions by the host computer.
Another type of multiple-processor system, generally identified as "MIMD" (multiple instruction stream/multiple data stream), may not have an explicit host computer or central point of control. Instead, the processors execute their own individual programs, comprising separate instruction streams, on their individual data streams. The processors in a MIMD system are typically interconnected by a router over which each processor can exchange information, generally in the form of messages addressed to one or more of the other processors. If the system does not have a central point of control, the processors can synchronize by exchanging messages.
Recently, hybrids of the SIMD and MIMD systems have been developed in which the processors may be loosely controlled by commands from a host computer. In response to a command in such "S/MIMD" (synchronous MIMD) or "M/SIMD" (multiple SIMD) systems, the processors may execute one or a series of instructions on their individual items of data, with the particular series of instructions depending on, inter alia, the processors' individual programming and the results of their processing in response to previous commands.
Multiple-processor computer systems are also generally designed around two diverse memory models, namely, a shared memory model and a distributed memory model. In a shared memory model, memory, which stores data to be processed by the processors, is connected to all of the processors on a substantially equal basis. Typically, the processors are connected to a common memory through a memory access network. Which generally permits each processor to have access to memory on an equal basis. In a distributed memory model, each processor has an associated memory which stores data that the processor will be processing at any particular point in the program. The processor/memory combinations, generally termed herein "processing elements," are interconnected by a routing network to permit data to be transferred there among as may be required by the program. An example of a routing network is described in the aforementioned '400 patent. A particular multiple-processor system may be designed as a hybrid of these models, in which some portion of the memory may be assigned to each processor, either as a cache or as system memory.
In a computer system, each memory location for storing a data item is identified by an address comprising a plurality of address bits. In a parallel computer system, each address may further include a plurality of high-order bits which essentially comprises a processor identifier, which identifies the processor in whose memory the storage location is located. In some multiple-processor computer systems, such as disclosed in the aforementioned '400 patent, the processors may also be divided into a plurality of processor nodes, each of which includes a predetermined number of processors. In that case, the address bits for a particular storage location comprise a high-order portion identifying a particular processor node, an intermediate portion identifying a processor within the processor node, and a low-order portion identifying a particular storage location in that processor's memory.
In processing many types of programs on, in particular, systems designed according to the distributed memory model, it is often necessary to first assign data to processors in a particular pattern, or geometry. At various points in a program, it may be desirable, or even necessary, to change the pattern by which the data is assigned to the respective processors, as well as the pattern by which the data is stored within the respective processors' memories. The assignment pattern is generally determined by the programmer, either explicitly or implicitly in the program. The programmer may change the data assignment pattern as needed to enhance the processing speed of the program. Such reassignments typically involve rearrangements of selected ones of the address bits.
An example of such a reassignment is a matrix transpose. In a matrix, each matrix element is identified by an element identifier (a.sub.M, . . . , a.sub.N, a.sub.N-1, . . . a.sub.0), where "a.sub.M, . . . , a.sub.N " represents a series of bits that identify the column of the matrix element, and "a.sub.N-1, . . . a.sub.0 " represents a series of bits that identify the row of the matrix element. In a matrix transpose, the matrix elements are rearranged such that the rows and columns are interchanged, essentially flipping the matrix around its diagonal. When that occurs, the row and column portions of each element identifier are interchanged to (a.sub.N-1, . . . a.sub.0, a.sub.M, . . . , a.sub.N). In one typical arrangment of a two-dimensional matrix on a parallel processor, the elements in various columns of the matrix are assigned to various ones of the processors. The memory associated with each processor stores elements in successive rows, in the column assigned to it, in successive storage locations. In that case, the column identifier portion of the element identifier, "a.sub.M, . . . , a.sub.N " can also serve to identify the processor, and the row portion, "a.sub.N-1, . . . , a.sub.0 " the storage location, which stores a particular matrix element. In performing a transpose, the processor identifiers and storage location identifiers for each matrix element are interchanged, essentially meaning that the successive matrix elements stored by each processor are distributed to all of the processors, and each processor has a matrix element previously maintained by each of the processors. The aforementioned Edelman application describes an arrangement for performing such a transpose operation efficiently in a parallel processor in which the processing elements are interconnected by a routing network in the form of a hypercube.