Multithreaded cores have recently become available from multiple vendors, and ship in many servers (e.g., Xeon hyper-threaded cores from Intel, Power 5 from IBM, Niagara from Sun). Multithreaded cores execute multiple hardware threads concurrently (or in a tightly interleaved fashion) on a single processor core. When used by common operating systems (e.g., Linux, Windows, AIX, Solaris), each hardware thread is typically represented by the operating system (OS) as a CPU (or processor, or hardware thread): a hardware entity that can execute a software thread. The OS is responsible for scheduling software threads for execution by cores and their hardware threads. The OS also monitors and reports on the utilization of hardware thread resources. A commonly used indication of utilization is CPU % or idle %, which is often measured and displayed independently for each hardware thread, each core, or aggregated for the entire system. For example, commands such as vmstat, mpstat, top, etc. and performance monitoring tools such as top, windows performance monitor, etc. may be used to view an indication of utilization. Current operating systems (e.g., Linux, Windows, AIX, Solaris) report utilization of hardware threads as CPUs. For monitoring purposes, these OSs treat hardware threads that share a common core in the same way that they have treated single-threaded processor cores.
While the use of multiple hardware threads tends to allow cores that support them higher total throughput per core than they would when running a single hardware thread per core, virtually all multi-threaded cores can show significant performance interference between threads sharing the same core. This interference can result in significant inaccuracies in the correlation between CPU utilization or idle % and system throughput. For some applications, such as applications that perform many dependent memory accesses and take many cache misses, hardware threads may interleave almost perfectly on a core, while for other applications, such as tight loop register-only computations, or bandwidth-saturating streaming, a single hardware thread can consume a significant portion of a core's execution resources, leaving little additional throughput to be gained by additional hardware threads sharing the core. This inconsistency introduces a new problem for system capacity and utilization monitoring, where one of the main indicators used by system administrators and monitoring systems to track system utilization and available headroom can now generate strongly flawed indications.
Thus, it would be desirable to measure and report system and core utilization in a way that correlates more closely with achieved or achievable throughput.