The present invention relates to dataflow programming environments, and more particularly to debugging in a dataflow programming environment.
Dataflow modeling is emerging as a promising programming paradigm for streaming applications for multicore hardware and parallel platforms in general. This more constrained programming model benefits high-level transformations and facilitates advanced code optimizations and run-time scheduling.
A dataflow program is made up of a number of computational kernels, (called “actors” or “functional units”) and connections that specify the flow of data between the actors. An important property of a dataflow program is that the actors only interact by means of the flow of data over the connections: there is no other interaction. In particular, actors do not share state. The absence of shared state makes a dataflow program relatively easy to parallelize: the actors can execute in parallel, with each actor's execution being constrained only by the requirement that all of its inputs be available.
FIG. 1 illustrates an exemplary graphical representation of a dataflow program 100 having seven actors, identified with respective reference numerals A, B, C, D, E, F, and G. The actors A, B, C, D, E, F, and G carry out their functions by means of their code (i.e., program instructions) being executed within a processing environment 101 that comprises one or more programmable processors 103 that retrieve program instructions and data from one or more non-transitory processor readable storage media (e.g., as represented by memory 105). Connections between the actors are indicated by arrows. The dataflow program 100 illustrates that an actor can have one or more input connections, and can have any number of output connections, including none. For example, actor G lacks any output ports, and is consequently commonly referred to as a “sink”. A sink does not affect the state of the other actors. In practice, sinks typically represent interaction with the environment in which the dataflow program executes. For example, a sink could represent an actuator, an output device, or the like. A sink could also represent a system that has not yet been implemented, in which case the sink mimics the missing subsystem's demand for input.
Feedback loops can be formed as illustrated in this example by actors C, D, E, and F forming a cycle, and also by actor B having a self-loop. It will be observed that feedback limits parallelism, since an actor's firing (i.e., its execution) may have to await the presence of input data derived from one of its earlier firings.
Communication between actors occurs asynchronously by means of the passing of so-called “tokens”, which are messages from one actor to another. This type of programming model is a natural fit for many traditional Digital Signal Processing (DSP) applications such as, and without limitation, audio and video coding, radio baseband algorithms, cryptography applications, and the like. Dataflow in this manner decouples the program specification from the available level of parallelism in the target hardware since the actual mapping of tasks onto threads, processes and cores is not done in the application code but instead in the compilation and deployment phase.
In a dataflow program, each actor's operation may consist of a number of actions, with each action being instructed to fire as soon as all of its required input tokens become valid (i.e., are available) and, if one or more output tokens are produced from the actor, there is space available in corresponding output port buffers. Whether the firing of the action occurs as soon as it is instructed to do so or whether it must nonetheless wait for one or more other activities within the actor to conclude will depend on resource usage within the actor. Just as the firing of various actors within a dataflow program may be able to fire concurrently or alternatively may require some sort of sequential firing based on their relative data dependence on one another, the firing of various actions within an actor can either be performed concurrently or may alternatively require that some sequentiality be imposed based on whether the actions in question will be reading or writing the same resource; it is a requirement that only one action be able to read from or write to a resource during any action firing.
An input token that, either alone or in conjunction with others, instigates an action's firing is “consumed” as a result (i.e., it ceases to be present at the actor's input port). An actor's actions can also be triggered by one or more state conditions, which include state variables combined with action trigger guard conditions and the action scheduler's finite state machine conditions. Guard conditions may be Boolean expressions that test any persistent state variable of the actor or its input token. (A persistent state variable of an actor may be modeled, or in some cases implemented, as the actor producing a token that it feeds back to one of its input ports.) One example (from among many) of a dataflow programming language is the CAL language that was developed at UC Berkeley The CAL language is described in “CAL Language Report: Specification of the CAL actor language”, Johan Eker and km W. Janneck, Technical Memorandum No. UCB/ERL M03/48, University of California, Berkeley, Calif., 94720, USA, Dec. 1, 2003, which is hereby incorporated herein by reference in its entirety. In CAL, operations are represented by actors that may contain actions that read data from input ports (and thereby consume the data) and that produce data that is supplied to output ports.
Typically, the data passing between actors is modeled as a First-In-First-Out (FIFO) buffer, such that an actor that is sourcing data pushes its data into a FIFO and an actor that is to receive the data pops the data from the FIFO. An important characteristic of a FIFO is that it preserves the order of the data contained therein; the reader of the FIFO receives the data in the same order in which that data was sent to the FIFO. Also, actors are typically able to test for the presence of data in a FIFO connected to one of the actor's input ports without having to actually pop that data (and thereby remove the data from the FIFO).
The interested reader may refer to U.S. Pat. No. 7,761,272 to Janneck et al., which is hereby incorporated herein by reference in its entirety. The referenced document provides an overview of various aspects of dataflow program makeup and functionality.
It will be appreciated from the above discussion that dataflow driven execution is different from the more traditional control flow execution model in which a program's modules (e.g., procedures, subroutines, methods) have a programmer-specified execution order.
Regardless of which execution model is followed, complex programs will almost always require that the programmer have some type of mechanism for finding errors (so-called “bugs”) in the program. Debugging tools are available for this purpose. Control flow debugging methods typically keep a context for a function. When a function call is made (i.e., when the function is invoked for execution by the processor(s)), the context of the calling function is placed on a stack. Hence, if a break point is reached (i.e., a programmer-designated point in a program at which program execution should halt), it is possible to inspect calling contexts on the stack.
Another quite recent feature in connection with control flow debugging is to give the user the experience of being able to “un-execute” the control flow in reverse order. Since programs do not really operate backwards, this illusion is created by keeping a record of all machine instructions executed and of all register and memory changes. Register and memory contents, as well as machine instruction pointers are then reverted in reverse order that corresponds to a certain point in the control flow. Such a feature has been available in a debugger called GDB (“GNU debugger”) since release 7.0. The interested reader is directed to information made available on the World Wide Web at http://www.gnu.org/s/gdb and at http://sourceware.org/gdb/wiki/ReverseDebug for more on this topic.
Despite the existence of tools such as those described above, the debugging of dataflow programs has presented its own problems. One common way of executing dataflow programs is to utilize a runtime system that is implemented using a control flow execution model. This runtime system is responsible for parts of the scheduling of the execution of the dataflow actions on the processor(s) and potentially also for handling the FIFO queues. When debugging the dataflow program using a debugger based on a conventional control flow execution model, it is necessary to debug the dataflow program in conjunction with the runtime system. For traditional control flow programs, the utilization of a context stack whenever a program is halted makes it easy to examine from what code segment the current function was called as well as the variables and other information in that context, backwards up the context stack. However, the inventors have ascertained that, since a dataflow program is driven by the presence of data and by function calls, such control flow debuggers are inadequate at least because they do not make it possible to trace dataflow concepts such as tokens nor do they enable reversion of an actor to a previous state synchronized with the data tokens.
It is therefore desirable to have improved tools for debugging dataflow programs to eliminate the shortcomings of conventional techniques.