Despite the rapid increase in the power of individual computer processors, there are many present and potential applications which could benefit from much greater computing power than can be provided by any individual present or foreseeable processor. The major approach to such greater computing power is to use parallel computers, that is, computers having more than one processor. Many different types of parallel computers have been designed, ranging from Symmetric Multi-Processing systems in which the each of the multiple processors and some amount of cache memory share main memory and all of the computer's other resources, to so-called shared-nothing systems where each processor has its own separate, often relatively large, main memory and, often, its own mass storage device, and the processors are only connected by computer network. The number of processors in current parallel computers vary from two to tens of thousands.
Parallel computers can provide a huge amount of raw computational power, as measured by all of the instructions per second which their multiple processors can execute. One of the major problem restricting the use of parallel computing has been the difficulty in programming and debugging parallel computing programs because of the complexity of their computation. Also the execution of large computations on parallel computers can often fail or be slowed drastically because of resource limitations effecting all or a part of such computations. In addition, parallel computations can be lengthy, particularly if they are not properly designed. For all these reasons, it is important for those designing and running parallel programs to be able to better understand the computation processes with which they are dealing.
A form of computation which has been previously used is record-based data flow programming. This form of computation causes a flow of records to pass through a stream of operators which remove or add records to the stream, modify the values in records. or create new records. Such computation can be performed on one processor or in parallel on a plurality of processors. Parallel Relational Data Base Systems (parallel "RDBMSs") run programs which respond to a user query written in a data base query language such as SQL, and then automatically create a corresponding parallel data flow graph. In such systems the user cannot explicitly create the graph, nor can he create, even indirectly, any graph other than one created in response to a query in a data base language.
A new approach to programming parallel record-based data flow programming is disclosed in U.S. patent application Ser. No. 08/627,801, filed by Michael Beckerle et al. on Mar. 25, 1996, entitled "Apparatuses And Methods For Programming Parallel Computers" (hereinafter the "Beckerle et al. Application". The Beckerle et al. Application is hereby incorporated into this application in its entirety. The rights in the Beckerle et al. Application are owned by Torrent Systems, Inc., the assignee of the present application.
This prior application discloses a system in which a user can explicitly define a data flow graph by connecting together graph objects including data sets and operators with datalinks. The operators have input and output ports at which they can receive and output records, respectively, over a datalink. Each such port has a defined schema which defines the name and type of fields from which or to which the operator is to read or write data. The schema can define transfer operators which designate that all field of a record are to be supplied from one input port to one or more output ports. Field adapters objects can be placed between a datalink and an operator to change the name or type of fields in the records supplied to or output by such operators.
The user is given the capability to define new parallelizable operators, including new parallel operators containing programming written by the user, new parallel operators each instance of which executes a standard sequential program, or new parallel operators using subgraphs defined from combinations of other, previously defined operators. The system automatically parallelizes the execution of the user defined graph.
It would also be helpful for those programming and running data flow graph computations could better understand the performance of such computations.