Massively parallel multiprocessors are computers with hundreds or even thousands of processors. One of the main problems in using such machines is how to implement interactions between a running program and a user, and specifically, how to implement terminal output to a user. Output can be a problem because the user might be flooded by independent output streams from all the processors, which might number in the hundreds or even thousands of processors. Regrettably, humans by nature can only concentrate on one data stream at a time.
Most parallel systems do not provide any special support for terminal output from a parallel program. Hence when many processors generate output, these output streams are interleaved by the system and presented to the user as one stream. If this is the case, the programmer must add processor identification tags to the outputs, and then the user must sift through the output to search for those parts that were generated by a particular processor. FIG. 1 illustrates such a conventional interleaved output stream with output in uppercase and input in lowercase. This situation is unsatisfactory because terminal output is very important in the programming and debugging phases of new applications. When an application malfunctions, programmers face the problem of finding out exactly what went wrong. As debuggers for parallel systems are not always available, many programmers insert output instructions into the program. The output is then used to try and create a mental picture of what the program is doing, and track down the bug.
In the Express system sold by ParaSoft Corp., the terminal output problem is reduced to some extent by limiting the semantics of terminal output so as to reduce the number of possible output patterns. For example, terminal output can be done in either "single" mode or in "multi" mode. Single mode means that all the processors must output exactly the same text. This is checked by the system, and then only one copy of the output is displayed to the user. Multi mode means that each processor buffers its output internally, until they all agree to "flush". The outputs from the different processors are then displayed to the user one after the other, according to the numerical order of the processors. A major drawback of this approach is that it limits the patterns of terminal output, and more specifically that it requires that the processors always synchronize in order to perform terminal output.
An alternative approach is described by J. E. Lumpp et al. in "CAPS: A Coding Aid for PASM," Comm. ACM, Vol. 34 No. 11, pp. 104-117 (Nov. 1991), where separate windows are provided to all processors in a partition. While this decouples the output operations of different processors, it does not scale up well for a large number of processors in a partition. If the number of processors is large (say a hundred processors in a partition), it is clearly very awkward to manage the display screen (having a hundred windows). The situation becomes unmanageable for a massively parallel partition that might contain thousands and even potentially tens of thousands of processors.
While text is the most natural and direct form of output, it has often been noted that human beings assimilate graphical information better than text. That is why various programming and instrumentation tools use graphics to present the user or programmer with information about the behavior of a parallel program. For example, the instrumentation facility described by R. R. Glenn and D. V. Pryor in "Instrumentation for a Massively Parallel MIMD Application," J. Parallel Distributed Comput., Vol. 12 No. 3, pp. 223-236 (July 1991) uses a 3-D array of colored dots to represent network congestion in a multiprocessor composed of processors connected in a 3-D) mesh. However, this cannot be considered output from the program itself.
The program visualization environment described by D. N. Kimmelman and T. A. Ngo in "The RP3 Program Visualization Environment," IBM J. Res. Dev., Vol. 35, No. 5/6, pp. 635-651 (Sept/Nov 1991) is much more versatile. It collects events from the hardware, system software and application at runtime, and displays them in a variety of ways: bar charts, x-y graphs, arrays of colored lights, and other specially designed formats. As events may be generated directly by the application, this can be viewed as a form of output. However, there is a strong separation between the generation of events on the one hand, and the display on the other hand. Events are simply tuples that include a processor ID, a timestamp, and a data value. The display is totally controlled by the user who is running the program, including the decision as to which format to use to display each type of events (including the option of not displaying them at all). Thus the program actually has no control over the appearance of its output. In addition, this system is very difficult to use if the sole objective is just to provide a means for output from a parallel program; users need to learn how to use the system and how to set up the displays, and cannot simply expect the system to create a "reasonable" display by itself.
Parallel programs have used graphical output directly in scientific computing. For example, the Express system supports graphic primitives that allow multiple processors to cooperate in the generation of a single image (see for example "Express C Reference Guide" 1990). This is a very flexible medium, which allows the programmer full freedom of expression; however, it also forces the programmer to take care of all the little details and make many design choices. Therefore graphical output is relatively hard to use in all of these prior systems, and consequently, it is usually not considered a viable alternative to text output at the program development stage.