Field of the Invention
The present invention relates generally to multiple processor systems, and more particularly to a technique for avoiding deadlocks within a directory based cache memory system.
A multiprocessor system typically includes a main memory that is shared by a number of processors, each having its own cache system. A cache memory is generally a buffer between a host processor and main memory. The cache memory is a small, fast memory that is located close to the host processor that stores the most recently accessed instructions or data. The cache increases system performance by storing information which the host processor requires frequently. Storing information in the cache increases system performance by avoiding long delays associated with access to main memory. Each cache system may include multiple levels of caches (e.g., the P6 processor from Intel(copyright) has a two level cache memory system).
The multiprocessor system enables data to be shared by multiple processors simultaneously. For example, a multiprocessor system with two processors, A and B, allows both processors to store a copy of data, D, in their respective caches simultaneously. The multiprocessor system, however, introduces a problem of cache coherency. The cache coherency problem arises when processor A modifies its copy of D, while B simultaneously uses its copy of D. Once A modifies its copy of D, the copy of D held in B""s cache is no longer valid. This situation is incoherent because if processor B were to read D from its cache, the wrong value of D will be returned. Cache coherency can be achieved by ensuring that B cannot use its copy of D until that copy is made equal to the modified copy held in A""s cache.
One way to ensure cache coherency is with a directory protocol. The directory protocol ensures the coherency of all caches within the system by acting as a reference for the operations that the processor may perform on a particular cache line. The cache line represents the smallest unit of data transferred between the memory and the cache. Before being allowed to modify the cache line, the processor must have certain access rights to the cache line. This access may be of different types (e.g., access to read the cache line or access to modify the cache line), and is referred to generally as ownership. In a simple two processor system, for example, ownership of the cache line is granted through a messaging system, wherein processor A requests a particular level of ownership (e.g. read or write/modify) of the cache line presently owned by processor B. To grant ownership, processor B responds to processor A""s request message.
Coherency is concerned with the validity of a single cache line, not with the relationship between one cache line and another. For example, following the protocol suggested above, it is possible for a first processor (P1) to perform a write of a first value (V1) followed by a write of a second value (V2), and for a second processor (P2) to capture the old value of V1 and the new value of V2. This relationship between V1 and V2 is referred to as memory consistency.
Memory inconsistency has two causes. First, there can be a delay between P1 obtaining write ownership for V1 and P2 receiving a message to invalidate its copy of V1. This may occur when ownership of the data item is shared (or read-only). Writes are not effective with respect to a particular processor until that processor receives the invalidate message.
The second cause for memory inconsistency occurs when the memory is distributed. In this case, one memory service unit may be busier than another memory service unit. This results in memory not necessarily being accessed in the same order as it is requested. Furthermore, for performance reasons, processors perform requests for data ahead of time and hold the data until they are needed. This results in data items being obtained in an order that differs from the program order. For more detail concerning memory consistency see Hennessy and Patterson, Computer Architecture a Quantitative Approach, Second Edition, Morgan Kaufmann Publishers, 1996, pp. 708-721.
One technique used to establish memory consistency uses a xe2x80x9clockingxe2x80x9d operation. In this technique, a writing processor xe2x80x9clocksxe2x80x9d a shared data to prevent other processors from accessing the data, writes to the shared data, and then xe2x80x9cunlocksxe2x80x9d the shared data for use by the other processors. In a similar manner, a reading processor xe2x80x9clocksxe2x80x9d a shared data, reads the shared data, and then xe2x80x9cunlocksxe2x80x9d the shared data.
The locking operation thus described synchronizes the memory. To begin with, the locking operation functions as a start barrier. The processor executing the locking operation must wait until every other processor is finished with the data. Hardware mechanisms that detect whether writes have been performed ensure that reading processors have updated data and that writing processors do not over-write data that another processor is reading.
The locking operation also functions as an end barrier. A processor is not permitted to execute newer instructions until the locking operation has completed. This prevents reading processors from obtaining and holding old data before acquiring the lock.
A deadlock can occur when a processor performs a non-atomic operation on data that is simultaneously accessed by a second processor. An atomic operation is an operation in which a processor reads or writes to a memory location while preventing any another processor from reading or writing to that memory location. Non-atomic operations are operations in which the cache lines being operated upon by the processor can be read or written to by another processor before the operation completes. A non-atomic operation can occur, for example, when the data being operated on crosses a cache line boundary. That is, the data being modified is located in two different lines of the cache. For example, if data is stored in the cache such that two cache lines must be read to access the data, the operation crosses the cache line boundary. A processor accessing the data may perform an atomic operation on the first cache line, but the second cache line can still be accessed by another processor in the interim.
One way to solve the deadlock problem is to require all other processors to go idle once the processor starts the non-atomic operation. Systems typically accomplish this by generating a xe2x80x9clockxe2x80x9d signal which is transmitted to the other processors over a system bus. The xe2x80x9clockxe2x80x9d signal prevents any other processor from accessing the bus while the xe2x80x9clockxe2x80x9d is asserted. Implementing this in the multiprocessor system, however, can be problematic if the system contains more than one bus. Moreover, if the processor does not provide a mechanism for supplying the xe2x80x9clockxe2x80x9d signal, the above solution cannot be implemented.
What is needed is a system and method for performing non-atomic operations in a multiprocessor system that avoids the problem of deadlock.
Briefly stated, the present invention is directed to avoiding a deadlock in a multiprocessor system when a processor performs a non-atomic operation on data that can be simultaneously accessed by a second processor. A deadlock is avoided through a split lock operation comprising a series of messages designed to give a processor exclusive access to the memory so that no other processor may access the same data until after the non-atomic operation has completed. The messages used to avoid the deadlock include a split lock request, a lock message, a grant message, a gone idle message and a release idle message. By using the above messages, the method of the present invention accepts requests from multiple processors for exclusive access to memory, orders all of the requests, and awards exclusive access to the first processor to make a request.