Technical Field
Embodiments described herein relate to processing devices and more particularly, to achieving balanced execution in a multi-node cluster using run-time detection of performance variation.
Description of the Related Art
Parallel computing is the simultaneous execution of the same application or workload using multiple processing elements (e.g., nodes in a multi-node cluster) in order to obtain results faster. A parallel workload can be split up to be executed a piece at a time on many different nodes, and then put back together again at the end to get a data processing result. Often applications with multiple tasks executing concurrently may complete at different times, leading to significant levels of performance variation across a large scale system, with the nodes that finished early wasting power while waiting for other nodes to finish their tasks in the presence of dependencies such as synchronization barriers. Accordingly, the overall progress of the application is limited by the slowest tasks of the system. Performance variation can be caused by process differences among multiple processors, operating system noise, resource contention, and/or other factors. High performance computing (HPC) applications are often tightly synchronized and massively parallel, and thus performance variation on even a small subset of the system can lead to large amounts of wasted power and lost performance. One of the many challenges in scaling parallel applications in future extreme scale systems will be managing performance variation across the many nodes of the system.