A number of systems exist for construction of parallel computing DAGs, e.g. Apache Pig, Intel Threaded Building Blocks, and Twitter Storm. The key difference here is the insight of how to map DAG construction onto program language syntax, so that DAG construction is completely automatic, to produce the optimal DAG construction possible, and so that invalid DAGs cannot be constructed. By building this as a language, the compiler is free to parallelize or optimize at any scale, from fine-grained to coarse-grained, and there is no library boilerplate code to write. In the described system, DAG edges represent data dependencies, not communications specifically, but the compiler is free to turn them into communications if needed.
The proposed system is similar to, but different from, continuation passing style (CPS) and single static assignment (SSA). There is less boilerplate code than with CPS, and SSA gives no guarantees in the case of pointer aliasing, although much prior research into SSA may be useful in building a compiler for various type of programming such as in techniques described below. Neither CPS nor SSA gives suggestions about how to optimally map that paradigm onto programming language syntax or semantics to give guarantees about parallelizability.
A number of methods have arisen to create MapReduce pipelines, e.g. JavaFlume (which does profile-guided optimization to move mappers and reducers onto the same machine where possible). Our approach of enforcing linguistic constraints that guarantee the opportunity to perform powerful static analyses mean that we can make more optimal decisions about what to turn into a MapReduce operation in the first place, and what not to (i.e. in the described system, as much native-style computation as possible is performed within each node, and as little communication as possible is sent between nodes). Furthermore, by tracking the algebraic properties of functions, we enable code transformations that are not possible in a pipeline of hard-coded MapReduce operations, e.g. the use of partial reducers and mapper-reducer fusion
Various graphical programming tools have been developed (e.g. LabView), but rely on a “flowcharting” approach to programming, which neither respect the constraints of the lattice-based programming paradigm, nor work with how the brain prefers to work with program code. https://vimeo.com/36579366—Brett Victor, “Inventing on Principle”. Shares some ideas of realtime evaluation of code as it is being edited, but does not give a graphical representation of data dependencies of code. http://www.kickstarter.com/projects/ibdknox/light-table—Light Table IDE. Shares some ideas of realtime feedback and display of values flowing through functions, but displays intermediate values as substituted into the code, rather than displaying the data dependency graph graphically next to the code with pop-up visualizers. In particular, the programmer needs to compare two versions of the code side-by-side, one with values substituted.