1. Field of the Invention
The present invention relates to microprocessor cache subsystems in computer systems, and more specifically to an arbitration scheme for a dual-ported cache used in a multiprocessor, multiple cache environment.
2. Description of the Prior Art
The personal computer industry is a vibrant and growing field that continues to evolve as new innovations occur. The driving force behind this innovation has been the increasing demand for faster and more powerful computers. Historically, computer systems have developed as uniprocessor, sequential machines which can execute one instruction at a time. However, performance limits are being reached in single processor computer systems, and therefore a major area of research in computer system architecture is multiprocessing. Multiprocessing involves a computer system which includes multiple processors that work in parallel on different problems or different parts of the same problem. The incorporation of multiple processors in a computer system introduces many design problems that are not encountered in single processor architectures. One difficulty that has been encountered in multiprocessing architectures is the maintenance of cache coherency when each processor includes its own local cache. Therefore, one area of research in multiprocessor architectures has been methods and techniques to maintain cache coherency between multiple caches in a multiprocessor architecture.
Cache memory was developed in order to bridge the gap between fast processor cycle times and slow memory access times. A cache is a small amount of very fast, and expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main memory. A microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during memory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read hit takes place, and the data from the memory access can be returned to the processor from the cache without incurring wait states. If the data is not in the cache, then a cache read miss takes place, and the memory request is forwarded to the system and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor.
An efficient cache yields a high "hit rate", which is the percentage of cache hits that occur during all memory accesses. When a cache has a high hit rate, the majority of memory accesses are serviced with zero wait states. The net effect of a high cache hit rate is that the wait states incurred on a relatively infrequent miss are averaged over a large number of zero wait state cache hit accesses, resulting in an average of nearly zero wait states per access. Also, since a cache is usually located on the local bus of its respective microprocessor, cache hits are serviced locally without requiring use of the host bus. In this manner, each of the various processors can operate out of its local cache when it does not have control of the host bus, thereby increasing the efficiency of the computer system. In systems without microprocessor caches, each of the processors generally must remain idle while it does not have control of the host bus. This reduces the overall efficiency of the computer system because the processors cannot do any useful work at this time. However, if each of the processors includes a cache placed on its local bus, it can retrieve the necessary code and data from its cache to perform useful work while other processors or devices have control of the host bus, thereby increasing system efficiency. Therefore, processors operating out of their local cache in a multiprocessor environment have a much lower "bus utilization." This reduces the system bus bandwidth used by each of the processors, making more bandwidth available for other processors and bus masters.
Cache management is generally performed by a device referred to as a cache controller. A principal cache management responsibility in multiprocessor systems is the preservation of cache coherency. The type of cache management policy used to maintain cache coherency in a multiprocessing system generally depends on the architecture used. One type of architecture commonly used in multiprocessing systems is referred to as a bus-based scheme. In a bus-based scheme, system communication takes place through a shared bus, and this allows each cache to monitor other cache's requests by watching or snooping the bus. Each processor has a cache which monitors activity on the bus and in its own processor and decides which blocks of data to keep and which to discard in order to reduce bus traffic. Requests by a processor to modify a memory location that is stored in more than one cache requires bus communication in order for each copy of the corresponding line to be marked invalid or updated to reflect the new value.
Various types of cache coherency protocols can be employed to maintain cache coherency in a multiprocessor system. One type of cache coherency protocol that is commonly used is referred to as a write-through scheme. In a write-through scheme, all cache writes or updates are simultaneously written into the cache and to main memory. Other caches on the bus must monitor bus transactions and invalidate any matching entries when the memory block is written through to main memory. In another protocol known as a write-back scheme, a cache location is updated with the new data on a processor write hit, and main memory is generally only updated when the updated data block must be exchanged with a new data block.
Multiprocessor cache systems which employ a write-back scheme generally utilize some type of ownership protocol to maintain cache coherency. In this scheme, any copy of data in a cache must be identical to (or actually be) the owner of that location's data. The owner of a location's data is generally defined as the memory unit having the most recent version of the data residing in a respective memory location. Ownership is generally acquired through special read and write operations defined in an ownership protocol.
The cache controller includes a directory that holds an associated entry for each data entry or set in the cache. In multiprocessor architectures, this entry generally includes two components: a tag and a number of tag state bits for each of the respective lines in each cache set. The tag acts as a main memory page number, and it holds the upper address bits of the particular page in main memory from which the copy of data residing in the respective set of the cache originated. The tag state bits determine the status of the data in the respective set of the cache, i.e. whether the data is invalid, owned, or shared with an another cache, etc.
In addition to responding to accesses from its local processor for data, each cache controller in a multiprocessor system generally includes a snooping mechanism which monitors the bus during memory accesses (reads and writes) by other processors and devices. The cache controller compares the addresses on the bus during these cycles with the tag values stored in its tag RAM's to determine if the data being accessed resides in the cache. If the accessed data resides in the cache, then a snoop hit has occurred. If a snoop read hit occurs to an owned location during an external read request, the cache controller inhibits or aborts the current memory cycle and accesses its cache memory to supply the owned data. If a snoop write hit occurs during an external write, the cache controller generally invalidates data in its cache memory written by another processor.
In common multiprocessor designs, several different buses are present in the computer system. A local processor bus is associated with each processor and its cache. The processors and related caches are then located on a host bus, which preferably includes the main memory in the computer system. The host bus is coupled by a controller and various buffers and latches to an input/output (I/O) bus used for connecting various peripheral devices or bus masters. Examples of I/O buses are the ISA or industry standard architecture based on the IBM Corp. PC/AT and the EISA or extended industry standard architecture.
In general, a local processor access and a host bus snoop access each require one clock cycle to access the tag RAM's to determine if a hit or miss has occurred. This cycle is generally referred to as a tag compare cycle. Many accesses also generally require a cycle for a tag update, referred to as a tag modify cycle, to alter the state bits in the tag.
As previously discussed, in a multiprocessor architecture, a cache controller generally must be able to service its local processor while also snooping the host bus. This requires that a cache system in a multiprocessor system have some type of dual ported scheme. In one dual ported scheme, the cache system can service accesses from both the local processor and the host bus, but only one access to the cache system can be made at a time. This scheme is desirable because it prevents the local processor and the snooping mechanism in the cache controller from both updating a cache entry at the same time, thereby preventing cache coherency problems from occurring.
Problems may arise using this type of dual ported scheme if one of either the local processor or the host bus is locked out of access to the cache system by the other. This problem is especially crucial in multiple processor systems where the cache controller must be guaranteed access to host bus cycle snoop accesses for cache coherency reasons. For example, if the cache was servicing its local processor and a snoop request occurred, the cache may not complete its processor access in time to snoop the host bus. In systems where the local processor allows for processor cycle aborts, this is generally not a problem since the cache controller can abort the current processor cycle and immediately respond to the snoop access. However, many processors do not allow processor cycle aborts, one of which is the Intel Corporation (Intel) 80386 microprocessor. In some systems it may be possible for a respective cache system to insert wait states in a processor bus or host bus cycle to delay the cycle until the cache has had a chance to snoop the access. However, if the bus cycle was originated by a device on a separate I/O bus which cannot have its cycles delayed, then problems may arise if the snoop access is not serviced immediately.
Therefore, this type of dual ported cache architecture requires some type of arbitration scheme to guarantee that the local processor and the host bus have equal access to the cache system. In addition, the arbitration scheme must guarantee that all host bus snoop requests receive access to the cache system for cache coherency reasons. Therefore, an arbitration scheme is desired which allows a cache system to efficiently service its local processor while also guaranteeing access to all snoop requests on the host bus to maintain cache coherency.