Embodiments presented herein generally relate to cloud computing, and more specifically to splitting workloads among processing elements executing in a cloud computing environment.
A cloud computing environment provides computing resources, such as services, processing resources, and storage capacity from large pools of physical computing systems. For example, the cloud computing environment may spawn a number of virtual machine instances on-demand for a given purpose, such as in a streams processing environment.
A streams processing environment may use the large amount of computing resources afforded by a cloud environment. In a streams processing environment, a distributed application receives large amounts (or “streams”) of input data, such as text messages, image data, and the like. Processing elements of the distributed application analyze the data through a series of operators, each operator serving a particular purpose. For example, one operator may filter certain data from a stream and pass the filtered data to another operator that evaluates the data for specified values. Further, based on the amount of incoming data, the distributed application may split a given operator among shared resources (e.g., other virtual machines) so that the resulting operators each perform a part of the original task. Splitting operators allows the distributed application to parallelize a given workload.
In the cloud computing environment, resources can become available and unavailable at any given time, e.g., due to implementation of the cloud computing environment, contractual agreements between the owner of the cloud and the end user, and the like. As a result, to effectively manage workloads processed in the streams runtime environment, an administrator needs to be aware of changes to resources in the cloud environment.