The present invention generally relates to parallel computing systems, and more particularly relates to monitoring system noises in parallel computing systems.
Monitoring system noises and understanding their impact on application performance are critical for the design and operation of parallel computers. The performance imbalance between concurrent threads is an important technical hurdle for improving the parallel scalability of massively parallel systems. In many parallel programs, the executions of all or each group of threads are frequently synchronized (e.g., to send computation results from all threads to all threads). As the numbers of nodes and cores per node grow continuously, a much larger number of threads will be collaborating for the same amount of computation (i.e., strong scaling). Consequently, the computation interval is reduced, which increases the impact of performance imbalance.