The present invention generally relates to the field of multi-processor environments. More particularly, the present invention relates to the use of a hardware controller to handle spin lock requests.
A multi-processor system includes a main memory, a shared resource, and processors interconnected via a bus. In a multi-processor system it is desired that some shared resources be locked to prevent other processors from utilizing that resource. A shared resource may be a portion of memory, an I/O device, or a register, for example. An I/O device may be locked to prevent other processors from using that I/O device, for example. To facilitate locking of a shared resource, a control structure is implemented in memory that comprises a lock bit. The shared resource may be locked by a processor to prevent other processors from using the shared resource until the processor with the lock, unlocks the shared resource.
In typical systems, to acquire a lock the processor executes a test and set instruction on the lock bit of the control structure corresponding to the resource for which the lock is being sought. This is a special sequence that allows a processor to acquire ownership of the lock. To gain ownership of a lock, the processor reads the current contents of the lock bit and writes a new value to the lock bit. The read and write operations to the lock bit are viewed by the system as a single operation that is non-interruptible.
If a processor reads an xe2x80x9cunlockedxe2x80x9d status from the lock bit during the test and set instruction, then that processor receives the lock for that resource. If the processor reads a xe2x80x9clockedxe2x80x9d status from the lock bit, then that processor does not receive the lock for that resource. If the processor does not receive the lock, the processor continually and repeatedly requests the lock using the test and set instruction described above. This is referred to as the processor xe2x80x9cspinning on the lock,xe2x80x9d because the processor repeatedly requests the lock until it receives the lock. When a processor spins on a lock, it occupies bandwidth on the processor bus, because each time a processor performs a test and set operation on the memory control structure that includes the lock bit, the processor has to acquire exclusive ownership of that control structure in memory. Thus, when multiple processors are all spinning on the same lock, they are continuously passing the ownership of the portion of memory that constitutes the control structure back and forth across the bus.
Spin locks have several other disadvantages. First, the mechanisms used to acquire a lock are not always fair. When a lock is heavily contended, there is no guarantee that that all processors contending for the lock will eventually acquire the lock. Second, the mechanism used to acquire a lock is not ordered. As such, processors that are spinning on a lock will not necessarily get the lock in the order in which they began to spin on the lock. Third, when a lock is being heavily contended, a significant amount of system bus bandwidth is consumed performing the cache coherency operations necessary to move the read-write copy of the cache line containing the lock control structure between the contending processors. Cache coherency refers to the process of controlling read-write access to a particular cache line of data. In a multi-processor environment, while several processors may each contain a xe2x80x9ccopyxe2x80x9d of a particular memory section of data, only one processor can have read-write access to that section of memory at one time. If one processor requests a read-write copy of data, all other processors (or caching agents) must give up their copies of the cache line. Cache coherency refers to this system of controlling revisions to memory and requires communication over a system bus.
Yet another disadvantage relates to the handling of interrupts. Interrupts are typically hardware signals that may stop a processor from performing one task, and have the processor begin processing a different task. Interrupts typically have an associated priority level. In this manner, in a conventional system, processors may be interrupted by an interrupt to handle a higher priority task. A processor typically records its current interrupt priority so that interrupts with lower priority may be redirected to another processor while the processor is servicing the higher priority task. However, while a processor is spinning on a lock, it may be able to handle some low priority tasks while waiting to receive the lock. In conventional systems, however, this is not possible.
The Microsoft Windows(copyright) 2000 operating system has identified many of the deficiencies with normal spin locks and has implemented queued spin locks in response to the noted deficiencies. However, the queued spin locks create other deficiencies. With a queued spin lock, it now takes more processor cycles to acquire and release a queued spin lock than a normal spin lock. Also, higher priority tasks (e.g., interrupts) should be masked while waiting to acquire a queued spin lock. This prevents the processor that has just been granted ownership of the lock from being preempted by an interrupt, which will cause additional spin lock delays on all the other processors that are waiting to acquire the same lock. However, by masking higher priority interrupts, system performance can be degraded because processors that are waiting deeper in the spin lock queue could potentially be allowed to service higher priority tasks.
Therefore, there is a need for a system and method that reduces processor traffic for spin locks, allows a processor that is spinning on a lock to handle low priority tasks while waiting for the lock, and allows for fairer allocation of locks. The present invention satisfies this need.
The present invention is directed to a system and method in which hardware is added to a crossbar of a multiple processor (M) system to reduce processor bus traffic caused by spin locks. The added hardware takes over responsibility for requesting locks to shared resources, relieving the processors of the MP system from this task, and thereby minimizing cache coherency operations and reducing processor bus traffic.
A multiple processor computer system according to the present invention comprises a plurality of processors, a main memory, and a crossbar structure. One or more system resources may be shared, including without limitation portions of main memory, I/O devices, and registers. For certain shared resources, a control structure may be provided in main memory for controlling a lock on that resource. According to the invention, methods and apparatus are employed in the crossbar structure to handle acquisition of locks to the shared resources, thereby relieving the processors of the system of this task. In one embodiment, the crossbar structure comprises, for each processor, a lock register. The processor writes to the lock register an address of the lock control structure associated with a particular shared resource when the processor desires to acquire the lock thereto. The crossbar, on behalf of the processor, performs memory operations (e.g., test and set operation) on the lock control structure at the address specified in the lock register in order to acquire the lock on behalf of the processor. Thus, responsibility for performing the test and set operation on a lock control structure is moved from the processors to the crossbar structure.
According to another feature of the present invention, a crossbar structure comprises, for each processor, an unlock register. The processor writes to the unlock register an address of the lock control structure associated with a particular shared resource when the processor desires to acquire the lock thereto. The crossbar, on behalf of the processor, performs memory operations (e.g., writing a zero) on the lock control structure at the address specified in the lock register in order to release the lock on behalf of the processor.
According to yet another feature of the present invention, current and future interrupt priority registers may be implemented to allow a processor to service a lower level interrupt while spinning on a lock and return to the proper interrupt level upon receiving the lock.
According to a further feature of the present invention, a queue may be implemented to allow processors to have more fair access to shared resources.
Other features and advantages of the present invention will become evident hereinafter.