Modern computer systems, especially High Performance Computer (HPC) systems, are incorporating more and more processors or processor cores that can be applied to solving complex problems. Utilization of many hundreds or thousands or even millions of cores requires tools for determining or visualizing where processing resources are being utilized or poorly utilized. High Performance Computing systems utilize parallel programming which dispatches processing over these many, many processors running many, many threads. It is also typically necessary to establish synchronization points for coordination and control of data being processed by these many threads.
It is also typical that trying to analyze performance data from a large plurality of threads or processes becomes so complex that approaches used in the past for analyzing performance from a single or small number of processors or processes or threads are not found to be useful. The users of performance visualization or analysis tools need a way to reduce the number of process's data they must analyze to understand HPC application performance. It becomes desirable to reduce the number of data sets from hundreds or even millions of sets of data to a few.