Modern day computer systems frequently comprise a central processing unit and a memory hierarchy including a relatively large, but relatively slow main memory module and a relatively fast, but relatively small cache memory coupled between the central processing unit and the main memory module. The data and instructions being processed at any one time by the central processing unit are temporarily stored in the cache memory to take advantage of the high speed of operation of the cache memory to thereby increase the overall speed of operation of the computer system. The use of a cache memory is based upon the principles of temporal locality and spatial locality. More specifically, when a central processing unit is accessing data and instructions from a particular space within physical memory, it will most probably access the data and instructions from that space and, in addition, access data and instructions from contiguous space, for a certain period of time. Accordingly, data blocks, including the contiguous space of physical memory where data being utilized by the central processing unit resides, are placed in the cache memory to greatly decrease the time required to fetch data and instructions from those frequently referred to data blocks.
A cache memory scheme may be either a write-through cache or a write-back cache. In a write-through cache, a central processing unit writes through to main memory whenever it writes to an address in cache memory. In a write-back cache, the central processing unit does not update the main memory at the time of writing to its cache memory but updates the memory at a later time. For example, when the central processing unit is changing the contents of its cache memory, it will send the latest copy of written-to data to the main memory before it refills the space within the cache occupied by the written-to data. In this manner, the speed of operation of the central processing unit is not slowed down by the time that would be required to update the main memory after each write operation. Instead, the main memory is updated at the completion of the operations relating to the data block contained in the cache memory.
Many computer systems operate on the basis of the concept of a single, simple copy of data. In a multi-processor system including several central processing units, each with its own write-back cache, incoherencies within the data arise when one of the central processing units writes to a data block in its cache memory. In other words, when a particular central processing unit writes to its cache, the main memory will not have a correct copy of the data until the central processing unit updates the main memory.
If a particular central processing unit requests a data block currently in the cache of another central processing unit of the multi-processor system and that data block has been written to by such other central processing unit on a write-back basis, as described above, a coherency scheme must be utilized to insure that the latest correct copy of the data is sent to the requesting central processing unit. Typically, heretofore known multi-processor systems have implemented a so-called "snoopy" protocol in a shared bus configuration for the several central processing units of the system to assure that the latest copy of a data block is sent to a requesting central processing unit.
Pursuant to the snoopy protocol, all of the central processing units of the multi-processor system are coupled to the main memory through a single, shared bus. Each of the caches of the several central processing units and any other devices coupled to the shared bus "snoop" on (i.e. watch or monitor) all transactions with main memory by all of the other caches. Thus, each of the caches is aware of all data blocks transferred from main memory to the several other caches throughout the multi-processor system. Inasmuch as the caches are coupled to the main memory by a single, shared bus, it is necessary to implement an arbitration mechanism to grant access to the shared bus to one of possibly several devices requesting access at any particular time. The arbitration mechanism will effectively serialize transactions with the main memory and the snoopy protocol utilizes the serialization to impose a rule that only one cache at a time has permission to modify a data block.
After modification of the data block in the one cache, the main memory does not contain a valid copy of the data until it is updated by the cache having the written to block, as described above. In accordance with the snoopy protocol, the copy of the written to data block in the one cache is substituted for the main memory copy whenever another cache requests that data block prior to the update of the main memory.
An ownership model of the snoopy protocol includes the concept of "ownership" of a data block. A device must first request and obtain ownership of a data block in its cache before it can write to that data block. At most one device can own a data block at any one time and the owner always has the valid copy of that data block. Moreover, the owner must update the main memory before it relinquishes ownership of the block to assure coherency of the data throughout the multi-processor system.
By definition, ownership means the right to modify a data block. When no device of the system owns a data block it is said to be owned by the main memory and copies of the data block may be "shared" by any of the devices of the system. A shared mode means the device has read only access to a shared copy of a data block residing in its cache. Since the main memory owns the data block in the shared mode, all shared copies exactly match the main memory and are, hence, correct copies. Once any one device other than main memory obtains ownership of a data block, no other device may share the block and all copies of the data block which are shared are invalidated at the time ownership is obtained by the one device.
It is implicit above that a single request is made and acted upon at a time on the shared bus. Hence all bus requests are ordered and all caches and the memory see them in the same order. It is also implicit above that the memory does not respond to a request if some cache owns the data; instead the owning cache supplies the data. This is typically done by having each cache search itself for the data with each bus request. If a cache finds that it owns the data, it suppresses memory (which could otherwise respond), typically with a signal wire for this purpose, and applies the data to the bus itself.
In a variation of the ownership model, the main memory includes a directory of all main memory lines to ensure that data coherency is maintained throughout the multi-processor system. The directory contains an entry for each data block in the main memory and each entry comprises a bit mask of k+1 bits. The number k equals the number of caches in the system with each one of the k bits of the mask corresponding to one of the caches. The extra bit of the k+1 bits provides the ownership status of the corresponding data block. Thus, if the (k+1) bit is on, then one and only one of the remaining bits can also be on since only one cache at a time can have write privileges to a data item.
The main memory utilizes the directory to provide a centralized cache coherency system. The main memory queries the directory for each read or write request that it receives from the various central processing units in the multi-processor system to determine the current state of the requested data item, i.e. owned or shared, depending upon the state of the (k+1) bit, and the location of copies of the data item, as indicated by the remaining bits of the bit mask.
The information obtained from the directory enables the main memory to enforce the coherency scheme. For example, if a data block is not owned and a read only copy is present in several of the caches, the (k+1) bit will be off and the bits corresponding to the caches that have a copy of the data block will each be on. If another central processing unit, which does not have a copy of the data block wants to write to the data block, it sends a request to the main memory for the data block with write privileges. The main memory will query the directory to determine that the data block is not presently owned and that copies reside in several of the caches. The main memory will send an invalidate signal to all of the caches where a copy of the data block resides, as indicated by the bits of the bit mask and then set the k+1 bit to now indicate the owned state.
The main memory further sets the bit corresponding to the requesting cache. The main memory can perform similar operations upon each read or write request to determine the state and location of any data block and to reset the bit mask and send control signals as required to maintain data coherency.
The above-described coherency protocols provide a highly effective scheme for maintaining data coherency throughout a multi-processor system including devices having write-back caches. However, a major drawback of the scheme is that only one device at a time may access main memory due to the single shared bus coupling all of the devices to the main memory and the necessity of serializing all transactions with the main memory. Thus, the maximum speed of operation theoretically possible in a system having parallel central processing units is diminished in practical applications since only one central processing unit at a time can complete a transaction with the main memory through the shared bus. Moreover, each device coupled to the shared bus must devote certain resources to an active and continuous monitoring of the shared bus in accordance with the snoopy protocol.