The present invention relates to dataflow programming environments, and more particularly to processing a dataflow program in a manner that results in a processed (e.g., reformulated) dataflow program having the same functionality but with increased parallelization within individual actors.
Dataflow modeling is emerging as a promising programming paradigm for streaming applications for multicore hardware and parallel platforms in general. This more constrained programming model benefits high-level transformations and facilitates advanced code optimizations and run-time scheduling.
A dataflow program is made up of a number of computational kernels, (called “actors” or “functional units”) and connections that specify the flow of data between the actors. An important property of a dataflow program is that the actors only interact by means of the flow of data over the connections: there is no other interaction. In particular, actors do not share state. The absence of shared state makes a dataflow program relatively easy to parallelize: the actors can execute in parallel, with each actor's execution being constrained only by the requirement that all of its inputs be available.
FIG. 1 illustrates an exemplary graphical representation of a dataflow program 100 having seven actors, identified with respective reference numerals A, B, C, D, E, F, and G. The actors A, B, C, D, E, F, and G carry out their functions by means of their code (i.e., program instructions) being executed within a processing environment 101 that comprises one or more programmable processors 103 that retrieve program instructions and data from one or more non-transitory processor readable storage media (e.g., as represented by memory 105). Connections between the actors are indicated by arrows. The dataflow program 100 illustrates that an actor can have one or more input connections, and can have any number of output connections, including none. For example, actor G lacks any output ports, and is consequently commonly referred to as a “sink”. A sink does not affect the state of the other actors. In practice, sinks typically represent interaction with the environment in which the dataflow program executes. For example, a sink could represent an actuator, an output device, or the like. A sink could also represent a system that has not yet been implemented, in which case the sink mimics the missing subsystem's demand for input.
Feedback loops can be formed as illustrated in this example by actors C, D, E, and F forming a cycle, and also by actor B having a self-loop. It will be observed that feedback limits parallelism, since an actor's firing (i.e., its execution) may have to await the presence of input data derived from one of its earlier firings.
Communication between actors occurs asynchronously by means of the passing of so-called “tokens”, which are messages from one actor to another. These messages can represent any type of information (e.g., numeric, alphabetic, program-defined values, etc.), with the particular type of information in any one case being defined by the dataflow program. As used herein, the term “value” refers to the particular information (as distinguished from the information type or range of possible information instances) represented by a token or instance of an actor state without any limitation regarding whether that value is numeric, alphabetic, or other, and without regard to whether the information is or is not a complex data structure (e.g., a data structure comprising a plurality of members, each having its own associated value).
The dataflow programming model is a natural fit for many traditional Digital Signal Processing (DSP) applications such as, and without limitation, audio and video coding, radio baseband algorithms, cryptography applications, and the like. Dataflow in this manner decouples the program specification from the available level of parallelism in the target hardware since the actual mapping of tasks onto threads, processes and cores is not done in the application code but instead in the compilation and deployment phase.
In a dataflow program, each actor's operation may consist of a number of actions, with each action firing as soon as all of its required input tokens become valid (i.e., are available) and, if one or more output tokens are produced from the actor, there is space available in corresponding output port buffers. Whether the firing of the action occurs as soon as it is instructed to do so or whether it must nonetheless wait for one or more other activities within the actor to conclude will depend on resource usage within the actor. Just as the firing of various actors within a dataflow program may be able to fire concurrently or alternatively may require some sort of sequential firing based on their relative data dependence on one another, the firing of various actions within an actor can either be performed concurrently or may alternatively require that some sequentiality be imposed based on whether the actions in question will be reading or writing the same resource; it is a requirement that only one action be able to read from or write to a resource during any action firing.
An input token that, either alone or in conjunction with others, instigates an action's firing is “consumed” as a result (i.e., it is removed from the incoming connection and ceases to be present at the actor's input port). An actor's actions can also be triggered by one or more state conditions, which include state variables combined with action trigger guard conditions and the action scheduler's finite state machine conditions. Guard conditions may be Boolean expressions that test any persistent state variable of the actor or its input token. (A persistent state variable of an actor may be modeled, or in some cases implemented, as the actor producing a token that it feeds back to one of its input ports. In FIG. 1, the actor B's self-loop can be an example of a persistent state variable of actor B.) One example (from among many) of a dataflow programming language is the CAL language that was developed at UC Berkeley The CAL language is described in “CAL Language Report: Specification of the CAL actor language, Johan Eker and Jörn W. Janneck, Technical Memorandum No. UCB/ERL M03/48, University of California, Berkeley, Calif., 94720, USA, Dec. 1, 2003”, which is hereby incorporated herein by reference in its entirety. In CAL, operations are represented by actors that may contain actions that read data from input ports (and thereby consume the data) and that produce data that is supplied to output ports. The CAL dataflow language has been selected as the formalism to be used in the new MPEG/RVC standard ISO/IEC 23001-4 or MPEG-B pt. 4. Similar programming models are also useful for implementing various functional components in mobile telecommunications networks.
Typically, the token passing between actors (and therefore also each connection from an actor output port to an actor input port) is modeled (but not necessarily implemented) as a First-In-First-Out (FIFO) buffer, such that an actor's output port that is sourcing a token pushes the token into a FIFO and an actor's input port that is to receive the token pops the token from the FIFO. An important characteristic of a FIFO (and therefore also of a connection between actor output and input ports) is that it preserves the order of the tokens contained therein; the reader of the FIFO receives the token in the same order in which that token was provided to the FIFO. Also, actors are typically able to test for the presence of tokens in a FIFO connected to one of the actor's input ports, and also to ascertain how many tokens are present in a FIFO, all without having to actually pop any tokens (and thereby remove the data from the FIFO).
The interested reader may refer to U.S. Pat. No. 7,761,272 to Janneck et al., which is hereby incorporated herein by reference in its entirety. The referenced document provides an overview of various aspects of dataflow program makeup and functionality.
As observed earlier, the amount of parallelism that can be extracted from a dataflow program is limited by feedback. This is because feedback limits the number of executions (“firings”) of an actor that can be performed (simultaneously) before the actor requires an input that depends on the result of one of those firings. Reference is again made to FIG. 1, which illustrates two examples of feedback: one being the connectivity between actors C through F and another being the self-loop of actor B.
Also as mentioned above, although actors do not share state, it is in many cases convenient to allow each actor to have local state. In the general case, mutation of the local state serializes the execution of the actor (i.e., the result of one firing is required by a subsequent firing). A common practice is to represent this constraint using feedback, with each stateful actor having a connection that is a self-loop (see, e.g., the actor B in FIG. 1). Any firing of the actor (at least conceptually) reads the current state as input and produces the possibly updated state as output.
Setting aside consideration of dataflow programs for the moment, parallelization and vectorization of sequential programs have been considered in other programming contexts. Loops (i.e., iterative control-flow constructs, such as “for”-loops) traditionally form the basis of such techniques. The amount of parallelism (e.g., the number of instances of the “loop body” that can execute in parallel) is limited by data dependence.
In an imperative programming language, such as C and FORTRAN (both of which have been studied extensively in the context of parallelization and vectorization), parallelization might be limited by true data dependence as well as artificial data dependence. True data dependence is the constraint that a value must be computed before it can be used (e.g., true data dependence exists in a program in which a first statement assigns a value to a variable a, and a subsequent statement utilizes the variable a), whereas artificial data dependence stems from the fact that storage (variables of the program) can be assigned (given values) multiple times (e.g., artificial data dependence exists in a program in which a variable a is used in a program statement that precedes a subsequent statement in which the variable a is assigned a new value; in this case, the subsequent statement cannot be executed until the first statement has been executed). There are two types of artificial data dependence: anti-dependence and output dependence. Anti-dependence is the requirement that all uses of a variable must take place before the variable is reassigned. Output dependence is the constraint that the order of two assignments must be preserved.
Unlike true data dependence, artificial data dependence can, at least theoretically, be eliminated by replicating storage. Examples of practical techniques to this end are:                “Scalar renaming”, which gives a variable a different “name” (storage location) in different parts of the program. Ideally, each assignment is associated with a distinct “name” (in which case no artificial dependence remains).        “Scalar expansion”, by which an array is substituted for a scalar variable.        In this way, each loop iteration gets a unique storage location for the scalar variable.        
There are also techniques that transform a program into a form that has an identical effect, but that avoids true data dependences that prevent parallelization. For instance:                “Induction variable substitution”, which substitutes a linear expression in a loop counter for a variable that is incremented (or decremented) by a constant in each iteration of the loop. The concept of induction variables can be generalized into other sequences that can be expressed as functions of a loop counter. This technique is thus not limited to linear functions.        “Idiom recognition”, which substitutes an efficient parallel implementation for part of a loop that computes a particular function (from a set of “known” functions). An example employs so-called “reductions” (sum, product, min or max, etc., over all elements in an array). Given original code that uses a scalar variable to accumulate results, which serializes loop iterations (true data dependence), each of the mentioned reductions has a parallel implementation with an equivalent effect and these implementations could be known to a compiler a priori.        
The interested reader is referred to H. Zima, “Supercompilers for Parallel and Vector Computers”, ACM Press, NY, USA 1991, ISBN 0-201-17560-6, pages 180, 184, 225, and 235 (which is hereby incorporated herein by reference in its entirety) for more information about the various techniques discussed above. For more information about induction variable substitution, reference is also made to M. Wolfe, “Beyond induction variables”, in Proc. ACM SIGPLAN Conf on Programming Language Design and Implementation (PLDI '92), 1992, pp. 162-174, which is hereby incorporated herein by reference in its entirety.
Returning now to a consideration of dataflow programming, given that the current practice is to serialize the firings of an actor with local state, slower execution of such programs can be expected. One could avoid this result by disallowing local state in actors, but such an approach would make dataflow programming less expressive and more cumbersome to use in practice. The alternative, which involves serialized execution of actors with state, may introduce serial bottlenecks in an implementation of a dataflow program on parallel hardware (e.g., multi-core, multi-processor, vector processor systems).
The inventors of the subject matter described herein have considered that one way to address this problem is to find a way to reformulate the actor's program code in a way that retains the code's functionality while increasing parallelism between that actor's actions. One impediment in following through with this approach, however, is that the techniques that are known in the field of parallelization and vectorization of loops in sequential control-flow programs are not directly applicable in the context of dataflow programs. The main complication is that, in general, the effect of each actor firing depends on both state and inputs. This breaks the regular access patterns that are required in loops, which are candidates for parallelization (or vectorization).
Considering the bigger picture, it is a great challenge to efficiently and automatically parallelize (or vectorize) a program that is written in a sequential, imperative programming language (e.g., C or FORTRAN). By contrast, a dataflow program is parallel by construction, because its actors can execute in parallel. Nonetheless, the parallel execution of actors in a dataflow program does not bring with it parallel execution of actions within an actor. Since dataflow programs are often run in processing environments that facilitate parallel execution of processes, it would be advantageous to make use of this environment to speed up the execution of individual actors defined within a dataflow program.
It is therefore desirable to have improved dataflow program parallelizing/vectorizing methods and apparatuses for achieving higher levels of parallel code execution in connection with dataflow programs.