In environments where jobs include fine-grained operators connected via data into flow graphs, such as in stream processing and similar computer system environments, it is often desirable for purposes of efficiency to coalesce the operators into partitions which can then be scheduled on multiple heterogeneous hosts in a load balanced manner. An operator is a piece of software that carries out a given function, and a data flow graph is a description of the connection between and functionality of the operators in the system. A host may be any type of processor that executes the operator, and a partition may be a grouping together of the operators for purposes of execution by one or more hosts. For example, there may be three operators A, B and C that respectively perform the functions of joining inputs, and two of these operators A and B may be partitioned together so they run on the same host, while the third operator C may be in a separate partition that runs on a separate host.
Consider, for example, an application processing scientific data. Some of the operators pull data from outside, and as such require a high central processing unit (CPU) percentage, and must be located on the hosts with outside connectivity. Thus, the user resource matches these input operators to I/O nodes, and at the same time, declares that they must not share the host with any other CPU intensive operators (also referred to herein as a “host ex-location” constraint). It may be impossible to partition the operators in such a way that satisfies all the constraints. For instance, consider three examples of sets of constraints which do not admit to any solution.
The first example includes three constraints. Constraint 1 states that operators A and B cannot be placed in the same partition; constraint 2 states that operators A and C must be placed in the same partition; and constraint 3 states that the operators B and C must be placed in the same partition. It is clear that the operators A, B, and C cannot be placed in partitions while satisfying all the three constraints.
The second example includes three constraints. Constraint 1 states that operators A and B must be placed in the same partition; constraint 2 states that the partition containing operator A must be assigned to a host in the set {h1, h2}; and constraint 3 states that the partition containing operator B must be assigned to host h3. It is clear that there is no partitioning and a host assignment satisfying all three constraints.
The third example includes four constraints. Constraint 1 states that no two operators in the set {A, B, C} must be placed in the same partition; constraints 2, 3, and 4 state that the partitions containing operators A, B, and C, respectively, must be assigned to a host in the set {h1, h2}. Because the three partitions containing operators A, B, and C need to be assigned to distinct hosts in the set {h1, h2}, there is no partitioning and a host assignment satisfying all four constraints.
In large operator graphs with many constraints, it is not easy to determine whether there is a partitioning of the operators so that all constraints are satisfied. Existing approaches cannot determine quickly and with high accuracy whether or not a specified set of constraints admits any solution.