In order to increase performance of information processing systems, such as those that include microprocessors, both hardware and software techniques have been employed. On the hardware side, microprocessor design approaches to improve microprocessor performance have included increased clock speeds, pipelining, branch prediction, super-scalar execution, out-of-order execution, and caches. Many such approaches have led to increased transistor count, and have even, in some instances, resulted in transistor count increasing at a rate greater than the rate of improved performance.
Rather than seek to increase performance strictly through additional transistors, other performance enhancements involve software techniques. One software approach that has been employed to improve processor performance is known as “multithreading.” In software multithreading, an instruction stream may be divided into multiple instruction streams that can be executed in parallel. Alternatively, multiple independent software streams may be executed in parallel.
In one approach, known as time-slice multithreading or time-multiplex (“TMUX”) multithreading, a single processor switches between threads after a fixed period of time. In still another approach, a single processor switches between threads upon occurrence of a trigger event, such as a long latency cache miss. In this latter approach, known as switch-on-event multithreading (“SoEMT”), only one thread, at most, is active at a given time.
Increasingly, multithreading is supported in hardware. For instance, in one approach, processors in a multi-processor system, such as chip multiprocessor (“CMP”) systems (multiple processors on single chip package) and symmetric multi-processor (“SMP”) systems (multiple processors on multiple chips), may each act on one of the multiple software threads concurrently. In another approach, referred to as simultaneous multithreading (“SMT”), a single physical processor core is made to appear as multiple logical processors to operating systems and user programs. For SMT, multiple software threads can be active and execute simultaneously on a single processor core. That is, each logical processor maintains a complete set of the architecture state, but many other resources of the physical processor, such as caches, execution units, branch predictors, control logic and buses are shared. For SMT, the instructions from multiple software threads thus execute concurrently on each logical processor.
For a system that supports concurrent execution of software threads, such as SMT, SMP, and/or CMP systems, an operating system may control scheduling and execution of the software threads. Alternatively, it is possible that some applications may directly schedule multiple threads for execution within a processing system. Such application-scheduled threads are generally invisible to the OS and are known as “user-level threads”.
User-level threads can be scheduled for execution by an application running on a processing resource that is managed by an OS. Alternatively, in a processing system with multiple processing resources, user-level threads may be scheduled to run on a processing resource that is not directly managed by the OS, but rather managed by a user-controllable software application in a manner such that OS resources are not effected by the user-level threads. User-level threads not directly managed by the OS may be referred to as “OS invisible” threads or “shreds”, whereas threads managed directly by the OS may be referred to as “OS visible” threads. Typically shreds run within an OS-visible thread, that is to say the shreds typically belong to a subset of threads within an OS-visible thread that use a subset of thread state context of the OS-visible thread.
Unfortunately, user-level threads can cause the OS to be interrupted under various circumstances, such as when the user-level threads encounter a page fault, exception, interrupt, system call, etc. Furthermore, processing of the user-level threads may be hindered by one or more user-level threads waiting on one or more user-level or OS-visible threads for access to processing resources, such as during a thread synchronization operation, such as a block or spin lock cycle.
OS interruptions by a user-level thread can be communicated in the form of proxy execution, in which the user-level threads interrupt the OS via the interface between the OS and the OS-visible thread to which the user-level thread(s) correspond. In proxy execution, the OS is not “aware” that the interruption is coming from the user-level thread, because the OS-visible thread interrupts the OS on behalf of the user-level thread(s).
Proxy execution and thread delay due to locking, for example, can cause degradation in computer system performance, especially as the number of OS-visible threads and user-level threads increase. Proxy execution, in particular, can detract the OS from performing other tasks thereby degrading computer system performance. Currently, there is no technique for user-level code to obtain information that could help the user-level code avoid or at least reduce the number of OS interruptions caused by proxy execution or thread locking.