The present invention relates to a computer system supporting multithreading (MT), and more specifically, to runtime capacity planning in a simultaneous multithreading (SMT) environment.
As processor speeds of computer systems have increased over the past decades, there has not been a proportional increase in the speed in which the memory of such computer systems can be accessed. Thus, the faster the processor's cycle time, the more pronounced is the delay to resolve data located in memory. The effects of such delays have been mitigated by adding additional caches to the memory nest, and in recent processors, with SMT.
SMT allows various core resources of a processor to be shared by a plurality of instruction streams known as threads. Core resources can include instruction-execution units, caches, translation-lookaside buffers (TLBs), and the like, which may be collectively referred to generally as a core. A single thread whose instructions access data typically cannot utilize the full core resource due to the latency to resolve data located in the memory nest. Multiple threads accessing data sharing a core resource typically result in a higher core utilization and core instruction throughput, but individual threads experience slower execution. In a super-scalar processor simultaneous-multithreading (SMT) implementation, multiple threads may be simultaneously serviced by the core resources of one or more cores.
In contemporary hardware platforms, MT is typically implemented in a manner that is transparent to multiple operating systems (OSes) running different workloads through virtualization of the MT hardware. One advantage of transparent MT is that the OS does not require modification to utilize the MT hardware. With this design point, the MT hardware becomes responsible for balancing the delivery of a high core instruction throughput (by increasing the number of executing threads per core) with a high thread speed (by minimizing the number of executing threads per core). Transparent MT operation with respect to the OS can result in high variability of response time, capacity provisioning, capacity planning, and charge back. This variability can occur because each OS is unaware of whether its work units execute with exclusive use of a core, or whether its tasks are executing as threads that share a core. For example, if the hardware runs a single MT thread per core when there is low compute utilization and runs with high thread density when there is high compute utilization, an OS has difficulty determining capacity in use (and charge back) and total remaining available capacity and delivering a repeatable transaction response time.