1. Field of the Invention
The present invention relates generally to multiple processor systems, and more particularly to a technique for avoiding deadlocks within a directory based cache memory system.
2. Related Art
A multiprocessor system typically includes a main memory that is shared by a number of processors, each having its own cache system. A cache memory is generally a buffer between a host processor and main memory. The cache memory is a small, fast memory that is located close to the host processor that stores the most recently accessed instructions or data. The cache increases system performance by storing information which the host processor requires frequently. Storing information in the cache increases system performance by avoiding long delays associated with access to main memory. Each cache system may include multiple levels of caches (e.g., the P6 processor from Intel.RTM. has a two level cache memory system).
The multiprocessor system enables data to be shared by multiple processors simultaneously. For example, a multiprocessor system with two processors, A and B, allows both processors to store a copy of data, D, in their respective caches simultaneously. The multiprocessor system, however, introduces a problem of cache coherency. The cache coherency problem arises when processor A modifies its copy of D, while B simultaneously uses its copy of D. Once A modifies its copy of D, the copy of D held in B's cache is no longer valid. This situation is incoherent because if processor B were to read D from its cache, the wrong value of D will be returned. Cache coherency can be achieved by ensuring that B cannot use its copy of D until that copy is made equal to the modified copy held in A's cache.
One way to ensure cache coherency is with a directory protocol. The directory protocol ensures the coherency of all caches within the system by acting as a reference for the operations that the processor may perform on a particular cache line. The cache line represents the smallest unit of data transferred between the memory and the cache. Before being allowed to modify the cache line, the processor must have certain access rights to the cache line. This access may be of different types (e.g., access to read the cache line or access to modify the cache line), and is referred to generally as ownership. In a simple two processor system, for example, ownership of the cache line is granted through a messaging system, wherein processor A requests a particular level of ownership (e.g. read or write/modify) of the cache line presently owned by processor B. To grant ownership, processor B responds to processor A's request message.
Coherency is concerned with the validity of a single cache line, not with the relationship between one cache line and another. For example, following the protocol suggested above, it is possible for a first processor (P.sub.1) to perform a write of a first value (V.sub.1) followed by a write of a second value (V.sub.2), and for a second processor (P.sub.2) to capture the old value of V.sub.1 and the new value of V.sub.2. This relationship between V.sub.1 and V.sub.2 is referred to as memory consistency.
Memory inconsistency has two causes. First, there can be a delay between P.sub.1 obtaining write ownership for V.sub.1 and P.sub.2 receiving a message to invalidate its copy of V.sub.1. This may occur when ownership of the data item is shared (or read-only). Writes are not effective with respect to a particular processor until that processor receives the invalidate message.
The second cause for memory inconsistency occurs when the memory is distributed. In this case, one memory service unit may be busier than another memory service unit. This results in memory not necessarily being accessed in the same order as it is requested. Furthermore, for performance reasons, processors perform requests for data ahead of time and hold the data until they are needed. This results in data items being obtained in an order that differs from the program order. For more detail concerning memory consistency see Hennessy & Patterson, Computer Architecture a Quantitative Approach, Second Edition, Morgan Kaufmann Publishers, 1996, pp. 708-721.
One technique used to establish memory consistency uses a "locking" operation. In this technique, a writing processor "locks" a shared data to prevent other processors from accessing the data, writes to the shared data, and then "unlocks" the shared data for use by the other processors. In a similar manner, a reading processor "locks" a shared data, reads the shared data, and then "unlocks" the shared data.
The locking operation thus described synchronizes the memory. To begin with, the locking operation functions as a start barrier. The processor executing the locking operation must wait until every other processor is finished with the data. Hardware mechanisms that detect whether writes have been performed ensure that reading processors have updated data and that writing processors do not over-write data that another processor is reading.
The locking operation also functions as an end barrier. A processor is not permitted to execute newer instructions until the locking operation has completed. This prevents reading processors from obtaining and holding old data before acquiring the lock.
A deadlock can occur when a processor performs a non-atomic operation on data that is simultaneously accessed by a second processor. An atomic operation is an operation in which a processor reads or writes to a memory location while preventing any another processor from reading or writing to that memory location. Non-atomic operations are operations in which the cache lines being operated upon by the processor can be read or written to by another processor before the operation completes. A non-atomic operation can occur, for example, when the data being operated on crosses a cache line boundary. That is, the data being modified is located in two different lines of the cache. For example, if data is stored in the cache such that two cache lines must be read to access the data, the operation crosses the cache line boundary. A processor accessing the data may perform an atomic operation on the first cache line, but the second cache line can still be accessed by another processor in the interim.
One way to solve the deadlock problem is to require all other processors to go idle once the processor starts the non-atomic operation. Systems typically accomplish this by generating a "lock" signal which is transmitted to the other processors over a system bus. The "lock" signal prevents any other processor from accessing the bus while the "lock" is asserted. Implementing this in the multiprocessor system, however, can be problematic if the system contains more than one bus. Moreover, if the processor does not provide a mechanism for supplying the "lock" signal, the above solution cannot be implemented.
What is needed is a system and method for performing non-atomic operations in a multiprocessor system that avoids the problem of deadlock.