The present invention relates to dataflow programming environments, and more particularly to data handling among actors in a dataflow programming environment.
Dataflow modeling is emerging as a promising programming paradigm for streaming applications for multicore hardware and parallel platforms in general. This more constrained programming model benefits high-level transformations and facilitates advanced code optimizations and run-time scheduling.
A dataflow program is made up of a number of computational kernels, (called “actors” or “functional units”) and connections that specify the flow of data between the actors. An important property of a dataflow program is that the actors only interact by means of the flow of data over the connections: there is no other interaction. In particular, actors do not share state. The absence of shared state makes a dataflow program relatively easy to parallelize: the actors can execute in parallel, with each actors execution being constrained only by the requirement that all of its inputs be available.
FIG. 1 illustrates an exemplary graphical representation of a dataflow program 100 having seven actors, identified with respective reference numerals A, B, C, D, E, F, and G. The actors A, B, C, D, E, F, and G carry out their functions by means of their code (i.e., program instructions) being executed within a processing environment 101 that comprises one or more programmable processors 103 that retrieve program instructions and data from one or more non-transitory processor readable storage media (e.g., as represented by memory 105). Connections between the actors are indicated by arrows. The dataflow program 100 illustrates that an actor can have one or more input connections, and can have any number of output connections, including none. For example, actor G lacks any output ports, and is consequently commonly referred to as a “sink”. A sink does not affect the state of the other actors. In practice, sinks typically represent interaction with the environment in which the dataflow program executes. For example, a sink could represent an actuator, an output device, or the like. A sink could also represent a system that has not yet been implemented, in which case the sink mimics the missing subsystem's demand for input.
Feedback loops can be formed as illustrated in this example by actors C, D, E, and F forming a cycle, and also by actor B having a self-loop. It will be observed that feedback limits parallelism, since an actor's firing (i.e., its execution) may have to await the presence of input data derived from one of its earlier firings.
Communication between actors occurs asynchronously by means of the passing of so-called “tokens”, which are messages from one actor to another. These messages can represent any type of information (e.g., numeric, alphabetic, program-defined values, etc.), with the particular type of information in any one case being defined by the dataflow program. As used herein, the term “value” refers to the particular information (as distinguished from the information type or range of possible information instances) represented by a token or instance of an actor state without any limitation regarding whether that value is numeric, alphabetic, or other, and without regard to whether the information is or is not a complex data structure (e.g., a data structure comprising a plurality of members, each having its own associated value).
The dataflow programming model is a natural fit for many traditional Digital Signal Processing (DSP) applications such as, and without limitation, audio and video coding, radio baseband algorithms, cryptography applications, and the like. Dataflow in this manner decouples the program specification from the available level of parallelism in the target hardware since the actual mapping of tasks onto threads, processes and cores is not done in the application code but instead in the compilation and deployment phase.
In a dataflow program, each actor's operation may consist of a number of actions, with each action firing as soon as all of its required input tokens become valid (i.e., are available) and, if one or more output tokens are produced from the actor, there is space available in corresponding output port buffers. Whether the firing of the action occurs as soon as it is instructed to do so or whether it must nonetheless wait for one or more other activities within the actor to conclude will depend on resource usage within the actor. Just as the firing of various actors within a dataflow program may be able to fire concurrently or alternatively may require some sort of sequential firing based on their relative data dependence on one another, the firing of various actions within an actor can either be performed concurrently or may alternatively require that some sequentiality be imposed based on whether the actions in question will be reading or writing the same resource; it is a requirement that only one action be able to read from or write to a resource during any action firing.
An input token that, either alone or in conjunction with others, instigates an action's firing is “consumed” as a result (i.e., it is removed from the incoming connection and ceases to be present at the actor's input port). An actor's actions can also be triggered by one or more state conditions, which include state variables combined with action trigger guard conditions and the action scheduler's finite state machine conditions. Guard conditions may be Boolean expressions that test any persistent state variable of the actor or its input token. (A persistent state variable of an actor may be modeled, or in some cases implemented, as the actor producing a token that it feeds back to one of its input ports.) One example (from among many) of a dataflow programming language is the CAL language that was developed at UC Berkeley The CAL language is described in “CAL Language Report: Specification of the CAL actor language, Johan Eker and km W. Janneck, Technical Memorandum No. UCB/ERL M03/48, University of California, Berkeley, Calif., 94720, USA, Dec. 1, 2003”, which is hereby incorporated herein by reference in its entirety. In CAL, operations are represented by actors that may contain actions that read data from input ports (and thereby consume the data) and that produce data that is supplied to output ports. The CAL dataflow language has been selected as the formalism to be used in the new MPEG/RVC standard ISO/IEC 23001-4 or MPEG-B pt. 4. Similar programming models are also useful for implementing various functional components in mobile telecommunications networks.
Typically, the token passing between actors (and therefore also each connection from an actor output port to an actor input port) is modeled (but not necessarily implemented) as a First-In-First-Out (FIFO) buffer, such that an actor's output port that is sourcing a token pushes the token into a FIFO and an actor's input port that is to receive the token pops the token from the FIFO. An important characteristic of a FIFO (and therefore also of a connection between actor output and input ports) is that it preserves the order of the tokens contained therein; the reader of the FIFO receives the token in the same order in which that token was provided to the FIFO. Also, actors are typically able to test for the presence of tokens in a FIFO connected to one of the actor's input ports, and also to ascertain how many tokens are present in a FIFO, all without having to actually pop any tokens (and thereby remove the data from the FIFO).
The interested reader may refer to U.S. Pat. No. 7,761,272 to Janneck et al., which is hereby incorporated herein by reference in its entirety. The referenced document provides an overview of various aspects of dataflow program makeup and functionality.
Typical applications in the signal processing domain operate on data streams. This characteristic makes it convenient to specify such applications as dataflow programs. Other applications, however, require that data structures be shared between different parts of the application. Conventional implementations of dataflow programs include passing a data structure between actors by means of copying of the structure.
The inventors of the subject described herein have ascertained that there are situations in which the entire structure is replicated from the output of one actor to the input of another actor when the actor receiving the data structure only requires a subset of that structure to fire its actions. This in turn leads to excessive copying of data, which reduces computing efficiency. It is therefore desirable to have improved data handling methods and apparatuses for use in connection with dataflow programs.