In existing approaches, challenges exist in scheduling multiple complex flows in a multi-platform cluster environment, and appropriately distributing resources among the platforms and simultaneously attempting to optimize a given set of per-platform performance metrics. For some platforms, these metrics might be a function of the completion time of each flow, and for others it might be a measure of utility (for example, throughput) achieved by each flow. Existing approaches do not provide solutions to this problem, nor do they provide infrastructure necessary to enforce resource sharing among multiple platforms in a cluster environment and attempt the optimization of the shared resources or the scheduling of the complex flows themselves.
Streaming flows can be complex in the sense that they can be described in terms of flow graphs of long-running software nodes (processing elements (PEs)) connected by streams. MapReduce flows can be complex in the sense that they can be described in terms of flow graphs of Map or Reduce jobs (including multiple independent tasks) connected by precedence relationships. Also, there can be constraints on the minimum amounts of resources allocated to each platform, minimum and maximum amounts of resources allocated to each job, and a notion of the relative rank of each platform.