This description relates to managing interfaces for sub-graphs in a dataflow graph. Many software applications exist for processing data. Some of these software applications are specified as dataflow graphs. Dataflow graphs typically include a number of data processing components, which are interconnected by links, sometimes referred to as “flows.”
When a dataflow graph is being executed, data (e.g., a dataset) is received from a database or from some other data storage or data queueing system. The received data advances through the dataflow graph by propagating through the flows and into the components according to dependencies defined by the interconnection of the components and flows. Each component processes data that it receives according to a predetermined function associated with the component before providing the processed data as output data via a flow. At the output of the dataflow graph the processed data is, for example, stored in another data storage or data queueing system, provided to another downstream system, or presented to a user.
A developer of a dataflow graph generally specifies the graph by dragging blocks representing components onto a graphical working area (or “canvas”) provided by a graphical user interface and interconnecting the components with links representing data flows such that the dataflow graph implements a desired functionality. Once the developer is satisfied with his or her implementation of the dataflow graph, he or she can save the dataflow graph to storage for later use. In general, if the developer needs to alter the their implementation of the dataflow graph at a later time, he or she can cause the graphical user interface to read the saved dataflow graph from storage, make changes to the dataflow graph, and then re-save the modified dataflow graph to storage.
In some examples, one or more segments of a dataflow graph are themselves implemented using dataflow graphs, which are referred to as “sub-graphs.” In those examples, a sub-graph is part of the dataflow graph. Thus, to alter a sub-graph that is used within a given dataflow graph, the developer requests the system to read the dataflow graph from disk, thereby enabling the developer to open the dataflow graph in the graphical user interface. Then the developer would, within the same graphical user interface, open the sub-graph segment so that the sub-graph can be edited. The developer can make changes to the sub-graph, and then the developer causes the dataflow graph with the modified sub-graph to be together re-saved to storage, thereby embedding the changes to the sub-graph in the saved dataflow graph.