In enterprise data analytics systems, customers typically have repeatable, complex, and inter-dependent business workflows which have an impact on the operation of various control systems. Services are provided to the customers by building large clusters of nodes to concurrently run several tasks. Because different customers may have different requirements and data processing needs, a particular service level may be provided to a given customer in accordance with a formally negotiated service level agreement (SLA). The SLA typically specifies particular aspects of the service, such as availability, serviceability, performance, and operation. Penalties may also be specified in the event of violations of the SLA.
In some workflows, top-level nodes have strict deadlines that must be met, with different nodes typically having different deadlines. In addition, nodes may depend on common ancestors and delays at given nodes may affect remaining nodes, causing the overall system SLA to be potentially missed. However, this issue can only be partially controlled by improving service to nodes because some control systems do not have quality of service (QoS) control procedures to expedite tasks if delays are experienced.
There is therefore a need for an improved system and method for ensuring QoS in a compute workflow.