Modern electronic systems rely on rapid execution of programs and manipulation of data. The majority of distributed cluster computing is based on dataflow programming models. Hadoop™ and Spark™ are representative examples of platforms for distributed cluster computing. One feature in the dataflow programming model is that data and worker mappings are predefined, allowing the execution to be deterministic. However, the parallel execution mismatching, deterministic execution information is missing outside the frameworks, causes Hadoop and Spark to not fully utilize data locality in their caching systems, OS page caches, and hardware cache/prefetcher.
For example, hardware prefetchers for CPU caches are usually based on a dump-truck technique, and this technique can deliver the data too early, too late or it can be the wrong data for execution. This can cause cache pollution, processor stall cycles, significant delays in the execution, and increased memory request activity that can disrupt all of the elements in a cluster computing environment. For the page caches, caching one-time read-only data pages will eventually over-write other more important pages, such as OS system libraries or system I/O files, resulting in node-wide performance degradation. Lastly, Hadoop's or Spark's native caching systems, for example, centralized cache management in HDFS, Tachyon caching, are not even aware of the deterministic execution information, losing opportunities for prefetching data into the cache structure.
Thus, a need still remains for electronic system with data management mechanism to improve execution reliability and performance in clustered computing environments. In view of the ever-increasing commercial competitive pressures, along with growing consumer expectations and the diminishing opportunities for meaningful product differentiation in the marketplace, it is increasingly critical that answers be found to these problems. Additionally, the need to reduce costs, improve efficiencies and performance, and meet competitive pressures adds an even greater urgency to the critical necessity for finding answers to these problems.
Solutions to these problems have been long sought but prior developments have not taught or suggested any solutions and, thus, solutions to these problems have long eluded those skilled in the art.