1. Field of the Invention
The present invention generally relates to data flow programming and directed acyclic graphs. In particular, the present invention generally relates to the merging of directed acyclic graphs in a data flow programming environment at runtime.
2. Background
Multiprocessor computer systems include two or more processors that may be employed to execute the various instructions of a computer program. A particular set of instructions may be performed by one processor while other processors perform unrelated sets of instructions.
Fast computer systems, like multiprocessor computer systems, have stimulated the rapid growth of a new way of performing scientific research. The broad classical branches of theoretical science and experimental science have been joined by computational science. Computational scientists simulate on supercomputers phenomena too complex to be reliably predicted by theory and too dangerous or expensive to be reproduced in a laboratory. Successes in computational science have caused demand for supercomputing resources to rise sharply in recent years.
During this time, multiprocessor computer systems, also referred to as “parallel computers,” have evolved from experimental contraptions in laboratories to become the everyday tools of computational scientists who need powerful computing resources in order to solve their problems. Several factors have stimulated this evolution. It is not only that the speed of light and the effectiveness of heat dissipation impose physical limits on the speed of a single processor. It is also that the cost of advanced single-processor computers increases more rapidly than their power. In addition, price/performance ratios become more favorable if the required computational power can be found from existing resources instead of purchased. This factor has caused many sites to use existing work station networks, originally purchased to do modest computational chores, as “SCAN”s (SuperComputers At Night) by utilizing the workstation network as a parallel computer. This scheme has proven successful enough, and the cost effectiveness of individual workstations has increased rapidly enough, that networks of workstations have been purchased to be dedicated to parallel jobs that used to run on more expensive supercomputers. Thus, considerations of both peak performance and price/performance are pushing large-scale computing in the direction of parallelism. Despite these advances, parallel computing has obstacles to its adoption.
One obstacle to the adoption of parallel computing and its benefits in economy and power is the problem of inadequate software. The developer of a program implementing a parallel algorithm for an important computational science problem may find the current software environment to be more of an obstruction than smoothing the path to use of capable, cost-effective hardware available. This is because computer programmers generally follow a “control flow” model when developing programs, including programs for execution by multiprocessor computers systems. According to this model, the computer executes a program's instructions sequentially (i.e., in a series from the first instruction to the last instruction) as controlled by a program counter. Although this approach tends to simplify the program development process, it is inherently slow.
For example, when the program counter reaches a particular instruction in a program that requires the result of another instruction or set of instructions, the particular instruction is said to be “dependent” on the result and the processor cannot execute that instruction until the result is available. Moreover, executing programs developed under the control flow model on multiprocessing computer systems results in a significant waste of resources because of these dependencies. For example, a first processor executing one set of instructions in the control flow program may have to wait for some time until a second processor completes execution of another set of instructions, the result of which is required by the first processor to perform its set of instructions. This wait-time translates into an unacceptable waste of computing resources in that at least one of the processors in this two-processor configuration is idle the whole time while the program is running.
To better exploit parallelism in a program some scientists have suggested use of a “data flow” model in place of the control flow model. The basic concept of the data flow model is to enable the execution of an instruction whenever its required operands become available, and thus, no program counters are needed in data-driven computations. Instruction initiation depends on data availability, independent of the physical location of an instruction in the program. In other words, instructions in a program are not ordered. The execution simply follows the data dependency constraints.
Programs for data-driven computations can be represented by data flow graphs. For example, consider the calculation of the following expression:z=(x+y)*2
When, for example, x is 5 and y is 3, the result z is 16. In this example, z is dependent on the result of the sum and x and y. The data flow graph is a directed acyclic graph (“DAG”) whose nodes correspond to operators and arcs are pointers for forwarding data. The graph demonstrates sequencing constraints (i.e., constraints with data dependencies) among instructions. FIG. 4a, described below, illustrates two exemplary DAGs.
For example, in a conventional computer, program analysis is often done (i) when a program is compiled to yield better resource utilization and code optimization, and (ii) at run time to reveal concurrent arithmetic logic activities for higher system throughput. For instance, consider the following sequence of instructions:                1. P=X+Y        2. Q=P/Y        3. R=X*P        4. S=R−Q        5. T=R*P        6. U=S/T        
The following five computational sequences of these instructions are permissible to guarantee the integrity of the result when executing the instructions on a serial computing system (e.g., a uniprocessor system):                1,2,3,4,5,6        1,3,2,5,4,6        1,3,5,2,4,6        1,2,3,5,4,6        1,3,2,4,5,6        
For example, the first instruction must be executed first, but the second or third instruction can be executed second, because the result of the first instruction is required for either the second or third instruction, but neither the second nor the third requires the result of the other. The remainder of each sequence follows the rule that no instruction can be run until its operands (or inputs) are available.
In a multiprocessor computer system with two processors, however, it is possible to perform the six operations in four steps (instead of six) with the first processor computing step 1, followed by both processors simultaneously computing steps 2 and 3, followed by both processors simultaneously steps 4 and 5, and finally either processor computing step 6. This is an improvement over the uniprocessor approach because execution time is reduced.
Data-flow style programming typically yields the most parallelism of various programming modes, but using data-flow programming optimally typically requires programming large parts of the program in that style. Unfortunately, it is generally preferable to program small parts of a program in the data flow format because it is easier to write and easier to maintain the resulting program. It may be desirable to write small modules in the data flow style and then combine them into a large program that appears as if it were written as a single data flow program. Unfortunately, it is difficult to use the easier programming style of creating many small modules and then merge them together into the large single module that is useful for peak performance.
Therefore, a need has long existed for a method and system that overcome the problems noted above and other related problems.