In real-time streaming workflows, data sources continuously generate data and the data is processed. The workflow should satisfy quality of service (QoS) metrics, such as targeted throughput or a time constraint for completing each segment of the streaming data. However, surges in one or more stages of the workflow can create bottlenecks in the workflow.
Previous methods resolved the bottlenecks by blindly adding more resources to the entire workflow. However, loading changes may be dynamic in a real-time environment and simply adding more resources may be inefficient or in accurate.
For example, not all processes may have a linear relationship between the processing speed and the amount of resources that are needed to handle the additional surge in load. Thus, if the incoming data load doubles, then the needed resources may not be two times more than the currently available amount of resources to preserve the original processing speed. Rather, the workflow may require four times more than the amount of available resources to handle the doubling of the incoming data, and so forth.