The present invention is in the field of systems, methods, and computer program products for the automatic exploitation of data parallelism in streaming applications.
Stream processing is a programming paradigm that naturally exposes task and pipeline parallelism. While pipeline and task parallelism occur naturally in stream graphs, data parallelism requires intervention. In the streaming context, data parallelism involves splitting data streams and replicating operators. The parallelism obtained through replication can be more well-balanced than the parallelism that exists in a particular stream graph, and can be more easily scaled to the resources at hand. Such data parallelism allows operators to take advantage of additional cores and hosts that the task and pipeline parallelism are unable to exploit.