1. Field of the Invention
The present invention relates to microprocessor cache subsystems in computer systems, and more specifically to methods for guaranteeing snoop access in a multiprocessor write-back cache environment with minimal effect on system speed.
2. Description of the Prior Art
The personal computer industry is a vibrant and growing field that continues to evolve as new innovations occur. The driving force behind this innovation has been the increasing demand for faster and more powerful computers. Historically, computer systems have developed as uniprocessor, sequential machines which can execute one instruction at a time. However, performance limits are being reached in single processor computer systems, and therefore a major area of research in computer system architecture is multiprocessing. Multiprocessing involves a computer system which includes multiple processors that work in parallel on different problems or different parts of the same problem. The incorporation of multiple processors in a computer system introduces many design problems that are not encountered in single processor architectures. One difficulty that has been encountered in multiprocessing architectures is the maintenance of cache coherency when each processor includes its own local cache. Therefore, one area of research in multiprocessor architectures has been methods and techniques to maintain cache coherency between multiple caches in a multiprocessor architecture.
Cache memory was developed in order to bridge the gap between fast processor cycle times and slow memory access times. A cache is a small amount of very fast, and expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main memory. A microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during memory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read hit takes place, and the data from the memory access can be returned to the processor from the cache without incurring wait states. If the data is not in the cache, then a cache read miss takes place, and the memory request is forwarded to the system and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor.
An efficient cache yields a high "hit rate", which is the percentage of cache hits that occur during all memory accesses. When a cache has a high hit rate, the majority of memory accesses are serviced with zero wait states. The net effect of a high cache hit rate is that the wait states incurred on a relatively infrequent miss are averaged over a large number of zero wait state cache hit accesses, resulting in an average of nearly zero wait states per access. Also, since a cache is usually located on the local bus of its respective microprocessor, cache hits are serviced locally without requiring use of the memory bus, also referred to as the host bus. In the description that follows, the host bus is the bus shared by the microprocessors and the random access memory in the computer system. Each of the various processors can operate out of its local cache when it does not have control of the host bus, thereby increasing the efficiency of the computer system. In systems without microprocessor caches, each of the processors generally must remain idle while it does not have control of the host bus. This reduces the overall efficiency of the computer system because the processors cannot do any useful work at this time. However, if each of the processors includes a cache placed on its local bus, each processor can retrieve the necessary code and data from its cache to perform useful work while other processors or devices have control of the host bus, thereby increasing system efficiency. Thus, processors operating out of their local cache in a multiprocessing environment have a much lower "bus utilization." This reduces system bus bandwidth used by each of the processors, making more bandwidth available for other processors and bus masters.
Cache management is generally performed by a device referred to as a cache controller. A principal cache management responsibility in multiprocessor systems is the preservation of cache coherency. The type of cache management policy used to maintain cache coherency in a multiprocessing system generally depends on the architecture used. One type of architecture commonly used in multiprocessing systems is referred to as a bus-based scheme. In a bus-based scheme, system communication takes place through a shared bus, and this allows each cache to monitor other cache's requests by watching or snooping the bus. Each processor has a cache which monitors activity on the bus and in its own processor and decides which blocks of data to keep and which to discard in order to reduce bus traffic. Requests by a processor to modify a memory location that is stored in more than one cache requires bus communication in order for each copy of the corresponding line to be marked invalid or updated to reflect the new value.
Various types of cache coherency protocols can be employed to maintain cache coherency in a multiprocessor system. One type of cache coherency protocol that is commonly used is referred to as a write-through scheme. In a write-through scheme, all cache writes or updates are simultaneously written into the cache and to main memory. Other caches on the bus must monitor bus transactions and invalidate any matching entries when the memory block is written through to main memory. In a write-back scheme, a cache location is updated with the new data on a processor write hit, and main memory is generally only updated when the updated data block must be exchanged with a new data block.
Multiprocessor cache systems which employ a write-back scheme generally utilize some type of ownership protocol to maintain cache coherency. In this scheme, any copy of data in a cache must be identical to (or actually be) the owner of that location's data. The owner of a location's data is generally defined as the respective location having the most recent version of the data residing in the respective memory location. Ownership is generally acquired through special read and write operations defined in an ownership protocol. One example of an ownership protocol is referred to as the Berkeley ownership protocol.
The Berkeley protocol is discussed briefly below in order to understand the various snooping and broadcasting requirements in a multiprocessor write-back cache protocol. The Berkeley protocol was designed for shared bus multiprocessor systems to minimize the bus utilization required to maintain cache coherency without additional memory system or bus design. All that is generally required to support the Berkeley protocol are extra signals to support special communication among the caches.
The cache controller includes a directory that holds an associated entry for each data entry or set in the cache. In multiprocessor architectures, this entry generally includes two components: a tag and a number of tag state bits for each of the respective lines in each cache set. The tag acts as a main memory page number, and it holds the upper address bits of the particular page in main memory from which the copy of data residing in the respective set of the cache originated. The tag state bits determine the status of the data in the respective set of the cache. In the Berkeley protocol, the possible states of a cache entry are: invalid, unowned, exclusively owned, or shared owned.
In the Berkeley protocol, copies of a memory block or line can reside in more than one cache, but only one cache can own a line, and the owner is the only cache allowed to update it. Owning a block also obligates the owner to provide the data to other requesting caches and to update main memory when the line is replaced in the cache. If the state of a block is exclusively owned, the owning cache holds the only cached copy of the block, which is updated locally without informing the other caches. If the state of a block is shared owned, other caches may have copies and must be informed about updates to the block. If the state of a block is unowned, several caches may have copies of the block, which cannot be written locally without acquiring ownership first. The invalid state indicates that the cache entry does not contain useful data.
The bus operations used in the Berkeley protocol are as follows:
1. Read-shared. This is a conventional read that gives a cache an owned copy of a block from a cache owner or from main memory.
2. Write. This is a conventional write that causes main memory to be updated and all cached copies to be invalidated. It can be issued only by I/O devices and other bus users without caches.
3. Read for ownership. This is like a conventional read except that the cache doing the read becomes the exclusive owner while matching entries in other caches are invalidated.
4. Write for invalidation. This operation updates a block in a cache, invalidates other cached copies, but does not update main memory. This is done later when the owned updated block is replaced from its cache.
5. Write without invalidation. This operation is used for flushing owned blocks to memory so that main memory is updated, but any other cached copies are kept valid.
The Berkeley Protocol reduces bus traffic when modified data is shared by having the cache that owns the block provide the data on external read requests and by postponing the memory update until the block is actually replaced. The Berkeley Protocol is implemented by the cache controller. The cache controller is primarily responsible for its own processor's use of the cache, but it also assists in maintaining cache coherency by updating the state of a cache block whenever it obtains or relinquishes ownership. The cache controller also includes a snooping mechanism which is responsible for monitoring the host bus and responding to the requests of other processors.
The action of the cache controller depends on the type of data access request from its processor, whether the data is in the cache, and in the case of a cache hit, on the state of the cache entry. In processor reads, there may be a cache hit or a cache miss. If there is a hit, the required data is provided to the processor. If there is a miss, the controller selects a cache entry to be replaced, flushing its data back to memory with a write-without-invalidation cycle if the replaced entry is owned. It then issues a read-shared cycle for the desired block and declares its state to be unowned.
In processor writes, when there is a cache hit and if the entry is exclusively owned, the processor writes to it without broadcasting the write on the bus. If the entry is shared owned or unowned, the cache controller sends a write-for-invalidation signal to the snooping mechanisms of other cache controllers before it modifies the block so that the other caches can invalidate their matching entries. When a cache write miss occurs and a block must be chosen for replacement, if the chosen block is owned, it is written to memory using a write-without-invalidation cycle. The requested block is then read with a read-for-ownership operation and updated, and its state becomes exclusively owned.
The snooping mechanism in a cache controller monitors the bus for accesses (reads and writes) from other processors. In external read requests, it accesses its cache memory to supply owned blocks and, in writes, it invalidates blocks in its cache memory written by another processor. If the read request by the other processor is a read-shared operation, it changes the entry's state to shared owned, and if the read request is a read-for-ownership operation, the state is changed to invalid.
The actions of the snooping mechanism depend on the type of system bus request, whether the request results in a hit or miss in its cache, and the state of the entry in its cache. If the bus request detected by the snooping mechanism is a read, it first determines whether the block is in its own cache. If not, no action is necessary. If there is a hit, the sequence of actions depends on the type of read (read-shared or read-ownership) and the state of the block hit (exclusively owned, shared owned, or unowned). A hit on a block marked invalid is treated as a miss.
If the block is owned, the snooping mechanism must inhibit memory from responding to the bus request and instead provide the data to the requesting processor or device. For a block that is exclusively owned, the snooping mechanism must first obtain sole use of the cache memory before responding to prevent the local processor from attempting to simultaneously update the entry. If the bus request is a read-for-ownership, then the snooping mechanism must invalidate its copy. If the bus cycle is a read-shared cycle, the snooping mechanism changes the block's state to shared owned if it was previously exclusively owned. If the bus request detected by the snooping mechanism is a write-for-invalidation cycle and if there is a cache hit, the snooping mechanism must invalidate the copy in its cache. If the write is a write-without-invalidation cycle, then another processor is flushing its cache, and no action is required. In addition to snooping processor cycles, the snooping mechanism must also monitor the host bus when an I/O device situated on a separate I/O bus is performing cycles on the host bus for cache coherency reasons.
As discussed above, in a multiprocessor architecture a cache controller generally must be able to service its local processor while also snooping the host bus. This requires that a cache system in a multiprocessor system have some type of dual ported scheme. In one dual ported scheme, the cache system can service accesses from both the local processor and the host bus, but only one access to the cache system can be made at a time. This scheme is desirable because it prevents the local processor and the snooping mechanism in the cache controller from both updating a cache entry at the same time, thereby preventing cache coherency problems from occurring.
The fact that only one access to the cache system can be made at a time may result in problems if the cache system is servicing its local processor and a snoop access is requested. Problems may also arise in a write-back cache environment using this scheme if both a local processor access and a host bus snoop request occur at the same time. For cache coherency reasons, it is important that all snoop accesses on the host bus be serviced immediately by the cache system so that no snoop accesses are lost. However, it is possible that several consecutive zero-wait-state writes could occur on the host bus while the cache is servicing its local processor, possibly causing the cache to miss one of the writes. The cache system may not gain access to the host bus in time to snoop the access. If the local processor allows for processor cycle aborts, then the cache generally can simply abort the processor cycle and immediately service the snoop request. However, many popular microprocessors do not allow for processor cycle aborts, one such notable processor being the Intel Corporation 80386 microprocessor. If the local processor does not allow for processor cycle aborts, then it is possible that the cache system would not gain access to the host bus in time to snoop the bus cycle. Therefore, a method and apparatus is needed to allow a cache to have some control over the cycles of another bus master or cache system running on the host bus in order to guarantee that the cache system has access to all host bus cycles for snooping purposes.