Platforms, such as MapReduce based data flow platforms, may be used for processing, e.g., large scale ETL (Extract, Transform, Load) and analytical workloads. For example, existing systems may translate/compile each data flow operation into one MapReduce job and/or a sequence of MapReduce jobs which may be executed independently by one or more processors across, e.g., a cluster of nodes.
Data flows (which may include subflows) may also include, for example, relational operations and non-relational operations. Relational data flow operations executing on common input data may contain flow or subflow operations that may share computations that may be easily reused. Due to the relative simplicity of these operations, there exists simple sharing opportunities for intra-query as well as inter-query optimization across such flow or subflow operations to eliminate the redundant scans and computations.