In many business data processing applications, a general purpose CPU is expected to handle a high volume of relatively short and relative homogeneous tasks or transactions. Often there is a high degree of potential parallelism between these tasks, both in the sense that they can be performed independently without interference and in the sense that for a large part of the time they perform exactly the same streams of instructions. One approach towards increasing throughput for these applications is to distribute the tasks to a number of processors. If the tasks are highly independent and contention-free, then some sort of network of independent asynchronous processors is suggested. However, if as in many data base applications, there is a high degree of contention for resources but the transactions are extremely homogeneous, then a network of synchronous processors working in an SIMD (single instruction multiple data) mode may be indicated. Groups of similar tasks may then be batched and run together through such a processor, synchronization minimizing the interprocessor communication is necessary in order to manage the resource contention. If the task consists of streams of straight line code (no branches), then all that is needed is a special purpose operating system for grouping, loading relevant data, starting and stopping.
When the tasks are general programs written for general purpose machines, they must be rewritten and broken down into straight line blocks. In such an event, a more complex scheduling is required.
SIMD parallel processors include a programmable control unit; a plurality of registers for storing counterpart vectors; mask registers; and means responsive to a sequence of one or more control unit instructions for concurrently operating upon data in the registers. Such machines may also be described as consisting of a programmable control unit driving an array of n parallel processors; each processor having a memory, arithmetic unit, program decode, and input/output (I/O) portions thereof. Such an array computer is described in Stokes et al, U.S. Pat. No. 3,537,074, issued Oct. 27, 1970.
Examples of data processing performable on such machines is described in IBM Technical Disclosure Bulletin Vol. 22, No. 6, pages 2489-2492, November 1979. These include parallel table-directed translations and the performance of selected vector operations on elements determined by masks associated with the modification of linked lists. 0ther applications, such as numerical weather prediction, lend themselves to the matrix-oriented processing available on such machines.
In SIMD machines, each linearly ordered program sequence, is called a basic block. More rigorously, each basic block consists of a maximal set of contiguous instructions uninterrupted by branches and targets except at its end points. Relatedly, the flow of control of these basic blocks may be modeled as a directed binary flowgraph. Unlike the processing of array data with its powerful matrix mathematics menu, the processing of conditional branching instructions is awkward. This derives from the delays imposed by the serialization of control as compared with the simultaneity of processing of data.
Presently, conditional branches cannot be supported on an SIMD machine except by the scheduling of the execution of basic blocks and managing masks controlling active and inactive parallel processors. In this respect, reference should be made to Stone et al, "Introduction to Computer Architecture", SRA Research Associates, Inc., 1975, at pages 333-338. Stone discusses masking for conditional branching in an SIMD machine. What he actually describes is the processing of high-level instruction sequences by an external CPU. The CPU selectively sends individual fragments to a control unit and an array processor for execution.
An example of a front-end processor coupling an array of processors over a distinctive I/O channel may be found in an IBM System 370 attaching an IBM 3838. This is described in IBM publication GA24-3639-1-, second edition, published in February 1977. Of interest is the fact that the 3838 array processor has twenty-one logic and index instructions but does not include a conditional branch or jump instruction.