Many times in solving problems with “large” data sets, there may be issues acquiring enough computing power to address the problem in the desired timeframe. The result may be that often time a distributed processing job overloads a number (or cluster) of the machines that may be dedicated to computing the solution. In example MapReduce style computations (or the like), a resource manager may be used to allocate machines at each of the Map and Reduce steps to attempt to alleviate stresses placed on a subset of the computing cluster. Streaming computing may present additional/alternative issues.
For example, since the computation may be constantly occurring for every processing phase at all times, it may be difficult to use conventional resource managers (e.g., used for non-streaming computing) to allocate resources. For instance, there may never be an ending of a streaming computing process, thus resource managers may see constant usage, and without knowledge of the stream processing topology, the resource managers may not be able to safely remove a machine without potentially introducing a break in the processing graph. Adding additional resources may also present similar complications.