1. Field
The present disclosure relates to simultaneous multithreading in which software tasks execute simultaneously by mapping them to pre-existing hardware threads of a central processing unit (CPU). More particularly, the disclosure concerns the monitoring and control of a CPU's hardware multithreading mode to minimize thread resource conflicts.
2. Description of the Prior Art
By way of background, many modern CPUs can process the instructions of two or more software tasks (e.g., threads of execution) simultaneously. This is known as simultaneous multithreading or SMT. SMT is supported by scheduling software threads that are managed by an operating system (OS), hypervisor or other thread scheduling entity to run on pre-existing hardware threads that are managed by the CPU. Hardware threads are independent instruction streams that execute in parallel while sharing resources within the CPU. Usually, the software that schedules software threads for execution on the CPU can set the number of hardware threads that are active at any given time. Each hardware thread can be exposed to the scheduling software as a logical CPU on which the software threads can be assigned to run. Given that the hardware threads are treated as logical CPUs, the scheduling software needs to perform hardware thread management housekeeping work—from interrupt handling to organizing and assigning the software threads to run on the active hardware threads. The process of switching hardware threads also requires software involvement and can be slow (e.g., running into 10s of milliseconds).
While the general goal of SMT is to maximize the instruction execution throughput of all software threads through parallel execution in as many hardware threads as possible within a CPU core, the scheduling software cannot easily decide whether it is more efficient to schedule the software threads for serial execution or simultaneously in parallel. The advantage of scheduling them simultaneously is that CPU hardware resources can be shared by all threads. Cache memory is one example. When software threads execute simultaneously as hardware threads, instructions that miss the CPU's onboard cache(s) can be overlapped, and thus net latency for executing the instructions can be reduced. However, when CPU cache(s) hold working sets for several software threads executing as hardware threads in SMT mode, each thread has a smaller available effective cache. If the software threads operate on a significant amount of data, the CPU cache(s) may not hold each thread's entire working set and the CPU may spend time swapping data into and out of the cache(s). This can make SMT ineffective because the CPU may spend an inordinate amount of time moving data and managing cache operations. Similar resource conflicts may arise with respect to other CPU hardware resources, such as translation lookaside buffers (TLBs), functional execution units, etc.
Unfortunately, such resource conflicts may be hard to identify and address in software. In the case of cache conflicts, although scheduling software could attempt to monitor cache thrashing activity, the software would have difficulty determining whether conflicts are being caused by normal software thread operations or hardware thread competition for cache resources. Evaluating the effectiveness of SMT operations to ensure maximum thread instruction execution throughput is thus somewhat difficult. Moreover, software workloads are typically dynamic in nature and may require rapid adjustment of SMT modes. Unfortunately, switching between SMT modes is often too slow to accurately track dynamic resource conflict scenarios that arise in the CPU. The present disclosure addresses these concerns and provides a novel SMT technique that accurately tracks dynamic resource conflicts between software threads and automatically sets SMT modes to optimize thread instruction execution throughput.