The present disclosure generally relates to data analytics and more specifically relates to adapting hardware configurations in cluster computing to optimize data analytics applications.
The increase in smart phone use, social media applications, and search engines, to name a few, have contributed to an enormous explosion of data growth. The volume of data created worldwide grows exponentially every day. “Big Data” is a relatively new term describing enterprise-size data sets. Analyzing Big Data to distill meaningful information requires leading edge hardware and software. Superscalar multiprocessors such as POWER8® by International Business Machines Corporation (IBM®) feature 4, 6, 8, 10 and 12 core variants, with each core capable of simultaneously handling up to eight hardware threads. A notable feature of POWER8 is that it balances thread assignments.
At the hardware layer, architectural techniques such as Simultaneous Multithreading (SMT) and Prefetch are used to improve overall system performance. At the software layer, Apache Spark™ by the Apache Software Foundation is one of the most popular large-scale data analytics frameworks. However, Spark™ performance is affected by different SMT and Prefetch settings, so the challenge becomes “How to tune the hardware settings to optimize the Spark performance?”