Distributed computing is becoming more popular. Data stream processing systems, such as Apache Storm™, Flink, Spark Streaming, Samza, and S4, have been created to take advantage of distributed computing environments, such as cloud computing systems, enabling increased processing capacity by “scaling out”. Data stream processing systems are used to perform realtime processing such as analytics, online machine learning, continuous computation, and other processor intensive yet latency sensitive computing tasks. However, when scaling up, by running a data stream processing system on a modern, multi-core processor, front-end stalls, particularly instruction cache misses and instruction queue full stalls, are major bottlenecks, which lead to significantly slower execution times. Furthermore, costly memory accesses across Central Processing Unit (CPU) sockets also limit the scalability of such data stream processing systems on multi-socket and/or multi-core processors.
Therefore, there is a need for an improved framework that addresses the above mentioned challenges.