Advances in technology have permitted logic designs to be packed with increased density into smaller areas of silicon as compared with past integrated circuit devices. Moreover, as process technologies have shrunk feature sizes and the sophistication of chip design has increased, chip configurations have evolved from the basic single, central processor model to include two or more processor cores. Additionally, multi-threaded processors are becoming increasingly common.
FIG. 7 shows an example of a microchip 700 including a plurality of processor cores. Each processor core, 0-N, comprises its own data cache 701, instruction cache 702, and ALU (arithmetic/logical unit) and FPU (floating point unit) execution units 703. Each processor core further has an architecture state 704. The architecture state includes the state of the instruction pointer, the state of all general registers, and other status information such as whether interrupts are enabled or disabled. In the context of execution threads, discussed in more detail below, the architecture state may thus be defined as all the information that is required to resume execution of an interrupted thread. A plurality of processor cores as shown in FIG. 7 typically share certain chip resources. For example, cores 0-N share cache 706, front side bus 707, and control logic 708 via multiplexer 705.
FIG. 8 shows an example of a microchip 800 including a plurality of execution threads, 0-N. Each thread corresponds to an architecture state. In contrast to the multi-core configuration of FIG. 7, the threads share a common data cache 801, instruction cache 802, and ALU and FPU execution units 803. The threads also share a front side bus 807. A thread identifier register 808 determines, via control inputs to multiplexer 805, which thread is currently executing.
To operating system software, chips 700 and 800 are functionally the same. That is, the operating system does not differentiate between a core and a thread; whether an operating system request for some processor service is being acted upon by a core or by a thread is transparent to the operating system.
Accordingly, as used herein, the term “logical processor” refers to either a processor core as shown in FIG. 7, or a thread as shown in FIG. 8.
The need to integrate the operations of multiple logical processors on a single silicon chip presents a range of challenges. One aspect of these challenges is the need to efficiently manage the shared resources, identified above, used by the respective types of logical processors. Shared resources are defined generally herein as those assets which are available to the different types of logical processors on the same silicon chip.
It should be understood by “shared” that while more than one logical processor may use the same resource, they generally should not do so concurrently. Thus, more than one logical processor may need a given resource at the same time, but only one logical processor can use it, while the others must wait their turn. The demand among a plurality of logical processors for the same resources requires control schemes to be implemented for managing possible access conflicts.
In the prior art, such control schemes have typically entailed the use of “semaphores.” A semaphore, in the foregoing context, is a data field, usually in a register, that contains information signaling that a particular logical processor of a plurality of logical processors has exclusive use of a shared resource. A logical processor with exclusive use of a resource may be said to have a “lock” on the resource.
In one known semaphore mechanism, each shared resource has its own guarding semaphore. Such a mechanism suffers from the inherent disadvantage that, if a logical processor requires multiple shared resources, it must serially determine, via each guarding semaphore, the availability of each of the required resources to attempt to obtain a lock on all of the resources. The availability of the required resources may change while the logical processor is attempting to make this determination. If the logical processor cannot obtain a lock on all the resources it needs, it must release the locks on the resources that it was able to obtain. This can lead to a counterproductive condition known as “thrashing,” wherein a number of logical processors are repeatedly contending for resources, but no single logical processor is able to obtain all the resources it needs and therefore no forward progress is made.
In another known semaphore mechanism, a global semaphore is used which locks all available resources for the exclusive use of the logical processor that is able to obtain the lock. However, this approach is inefficient in that all logical processors do not necessarily have overlapping resource needs. Therefore, it could be possible for a plurality of logical processors to use some resources in parallel. A global semaphore lock obviates this possibility.
Typically, once a logical processor with a semaphore lock on a resource no longer needs the resource, it releases the lock so that other logical processors can use it. However, another undesirable condition that can occur in the use of semaphores is that a logical processor that has obtained a lock may “go bad”; i.e., suffer some kind of hardware or software failure that causes it to retain the lock for an indefinite period of time. This condition, sometimes referred to as “livelock,” can also arrest all forward progress among other logical processors. One aspect of known semaphore mechanisms that makes such a problem difficult to resolve is that only the locking logical processor can release the lock.
An approach is needed to address the concerns noted in the foregoing.