Not applicable.
1. Field of the Invention
The present invention generally relates to reducing latency and directory writes in a multi-processor system. More particularly, the invention relates to reducing latency in a directory-based, multi-processor system. Still more particularly, the invention relates to eliminating directory write operations whenever possible in a directory-based coherence protocol.
2. Background of the Invention
Computer systems typically include one or more processors, memory, and many other devices. Often, the contents of memory are made available by a memory controller to the various other devices in the system. As such, two or more devices (e.g., two processors in a multi-processor system) may attempt to access the same block of memory at substantially the same time. Although being able to provide access to the same block of data by multiple devices in the system is highly desirable from a performance standpoint, it does necessitate taking steps to maintain the xe2x80x9ccoherencyxe2x80x9d of each data block.
In a multi-processor computer system, or any system for that matter in which more than one device may request concurrent access to the same piece of data, it is important to keep track of each block of data to keep the data coherent, meaning that the system accurately tracks the status of each data block and prevents two processors from changing two different copies of the same data. If two processors are given copies of the same data block and are permitted to change their copy, then the system at that point would have two different versions of what was previously the same data. The coherency problem is akin to giving two different people the permission to edit two different copies of the same document. Once their editing is complete, two different versions of the same document are present, whereas only one copy of the document is desired. A coherency protocol is needed to prevent this type of situation from happening.
One approach to the coherency problem in a multi-processor computer system is to provide a xe2x80x9cdirectoryxe2x80x9d for each data block. The directory thus comprises a plurality of entries, one entry for each data block unit. Each directory entry generally includes information that reflects the current state of the associated data block. Such information may include, for example, the identity of which processors have a shared copy of the block or which processor in the system has the exclusive ownership of the block. Exclusive ownership of a data block permits the exclusive owner to change the data. Any processor having a copy of the block, but not having the block exclusive, can examine the data but cannot change the data. A data block may be shared between two or more processors. As such, the directory entry for that block includes information identifying which processors have a shared copy of the block. In general, a directory-based coherency protocol solves the problems noted above.
It is always desirable to enable computer systems to work faster and more efficiently. Anything that can be done to decrease latency in a computer generally makes the computer operate faster. Directory-based coherency computer systems are no exception; reducing the latency involved in such systems is desirable.
The problems noted above are solved in large part by a computer system that has a plurality of processors. Each processor preferably has its own cache memory. Each processor or group of processors may have a memory controller that interfaces to a main memory, such as DRAM-type memory. The main memories include a xe2x80x9cdirectoryxe2x80x9d that maintains the directory coherence state of each memory block.
One or more of the processors may be members of a xe2x80x9clocalxe2x80x9d group of processors, such as might be the case if multiple processors are fabricated on the same chip. As such, the system might have multiple local processor groupings. Processors outside a local group are referred to as xe2x80x9cremotexe2x80x9d processors with respect to that local group.
Whenever a remote processor performs a memory reference (e.g., read or write) for a particular block of memory, the processor that maintains the directory for that block normally updates the directory to reflect that the remote processor now has exclusive ownership of the block. In accordance with the preferred embodiment of the invention, however, memory references between processors within a local group, do not result in a directory write. Instead, the cache memory of the local processor that initiated the memory requests places or updates a copy of the requested data in its cache memory and also sets associated tag control bits to reflect the same or similar information as would have been written to the directory. In this way, it is not necessary to write the directory for the requested block because the requesting processor""s cache has the same information.
If a subsequent request is received for that same block, the local processor that previously accessed the block examines its cache for the associated tag control bits. Using those bits, that processor will determine that it currently has the block exclusive and provides the requested data to the new processor that is requesting the data. As such, the processor that maintains the directory for the block can ignore the directory entry.
By eliminating directory writes whenever possible, there is a significant latency improvement because of the relatively high bandwidth, low latency nature of processor cache subsystems and the avoidance of directory writes to memory. These and other benefits will become apparent upon reviewing the following disclosure.