1. Field of the Invention
The present invention relates to microprocessor cache subsystems in computer systems, and more specifically to a method for incorporating least recently used and cache write policy information into the tag memories of a cache system.
2. Description of the Prior Art
The personal computer industry is a vibrant and growing field that continues to evolve as new innovations occur. The driving force behind this innovation has been the increasing demand for faster and more powerful computers. A major bottleneck in personal computer speed has historically been the speed with which data can be accessed from memory, referred to as the memory access time. The microprocessor, with its relatively fast processor cycle times, has generally been delayed by the use of wait states during memory accesses to account for the relatively slow memory access times. Therefore, improvement in memory access times has been one of the major areas of research in enhancing computer performance.
In order to bridge the gap between fast processor cycle times and slow memory access times, cache memory was developed. A cache is a small amount of very fast, and expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main memory. The microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during memory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read hit takes place, and the data from the memory access can be returned to the processor from the cache without incurring wait states. If the data is not in the cache, then a cache read miss takes place, and the memory request is forwarded to the system and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor.
An efficient cache yields a high "hit rate", which is the percentage of cache hits that occur during all memory accesses. When a cache has a high hit rate, the majority of memory accesses are serviced with zero wait states. The net effect of a high cache hit rate is that the wait states incurred on a relatively infrequent miss are averaged over a large number of zero wait state cache hit accesses, resulting in an average of nearly zero wait states per access. Also, since a cache is usually located on the local bus of the microprocessor, cache hits are serviced locally without requiring use of the processor or host bus. Therefore, a processor operating out of its local cache has a much lower "bus utilization." This reduces host bus bandwidth used by the processor, making more bandwidth available for other processors or bus masters.
An important consideration in cache performance is the organization of the cache and the cache management policies that are employed in the cache. A cache can generally be organized into either a direct-mapped or set-associative configuration. In a direct-mapped organization, the physical address space of the computer is conceptually divided up into a number of equal pages, with the page size equaling the size of the cache. The cache is divided up into a number of sets, with each set having a certain number of lines. Each of the pages in main memory has a number of lines equivalent to the number of lines in the cache, and each line from a respective page in main memory corresponds to a similarly located line in the cache. An important characteristic of a direct-mapped cache is that each memory line from a page in main memory, referred to as a page offset, can only reside in the equivalently located line or page offset in the cache. Due to this restriction, the cache only need refer to a certain number of the upper address bits of a memory address, referred to as a tag, to determine if a copy of the data from the respective memory address resides in the cache because the lower order address bits are pre-determined by the page offset of the memory address.
Whereas a direct-mapped cache is organized as one bank of memory that is equivalent in size to a conceptual page in main memory, a set-associative cache includes a number of banks, or ways, of memory that are each equivalent in size to a conceptual page in main memory. Accordingly, a page offset in main memory can be mapped to a number of locations in the cache equal to the number of ways in the cache. For example, in a 4-way set associative cache, a line or page offset from main memory can reside in the equivalent page offset location in any of the four ways of the cache.
A set-associative cache generally includes a replacement algorithm that determines which bank, or way, in which to fill data when a read miss occurs. Many set-associative caches use some form of a least recently used (LRU) algorithm that places new data in the way that was least recently accessed. This is because, statistically, the way most recently used or accessed to provide data to the processor is the one most likely to be needed again in the future. Therefore, the LRU algorithm generally ensures that the block which is replaced is the least likely to have data requested by the cache. An LRU algorithm is generally maintained by keeping LRU bit information associated with each set that points to the way least recently used.
Another replacement algorithm that can be used is referred to as a least recently modified (LRM) algorithm. In an LRM scheme, the LRU information is not updated on every cache read and write hit, but is updated when there is a cache line replacement or write hit. Other replacement techniques referred to as pseudo-LRU techniques have also been developed. One example of a pseudo-LRU technique is the internal cache in the Intel Corporation i486 microprocessor, which uses a 4-way set associative cache architecture. In this method, the four ways are grouped into two sets of two ways. Three bits are provided to determine first, which of the two groups was least recently used and then second, which of the two ways in the least recently used group was least recently used. This is a pseudo-LRU technique because it does not account for properly reshuffling the replacement order based on read hits to a particular way.
Cache management is generally performed by a device referred to as a cache controller. The cache controller includes a directory that holds an associated entry for each set in the cache. This entry generally has two components: a tag and a number of tag state bits. The tag acts as a main memory page number, and it holds the upper address bits of the particular page in main memory from which the copy of data residing in the respective set of the cache originated. The tag state bits determine the status of data in the respective set in the cache. In single processor schemes, the tag state bits generally comprise one or more valid bits which indicate whether the data associated with the respective tag entry is valid. In multiprocessor systems, the tag state bits generally indicate the status of data with respect to the other caches, as is explained further below.
The directories for each of the sets in the cache are generally stored in random access memory referred to as tag RAM's. Most conventional cache controllers also include a separate RAM for the LRU information to keep track of which way has been least recently used for each of the respective sets. However, the use of a separate RAM for LRU information is generally wasteful and unnecessary if extra bits can be made available in the tag RAM's. Therefore, a method is desired to allow LRU information to be stored in the tag RAM's to obviate the necessity of having an additional RAM exclusively for LRU information.
Multiprocessing is a major area of research in computer system architecture. Multiprocessing involves a computer system which includes multiple processors that work at the same time on different problems. In most multiple processor systems, each processor includes its own local cache. In this manner, each processor can operate out of its cache when it does not have control of the bus, thereby increasing system efficiency. However, one difficulty that has been encountered in multiprocessor architectures is the maintenance of cache coherency when each processor includes its own local cache.
As previously mentioned, cache management is performed by a device referred to as a cache controller. A principal cache management responsibility in multiprocessor systems is the preservation of cache coherency. The type of cache management policy used to maintain cache coherency in a multiprocessing system generally depends on the architecture used. One type of architecture commonly used in multiprocessing systems is referred to as a bus-based scheme. In a bus-based scheme, system communication takes place through a shared bus, and this allows each cache to monitor other requests by watching or snooping the bus. Each processor has a cache which monitors activity on the bus and in its own processor and decides which blocks of data to keep and which to discard in order to reduce bus traffic. Requests by a processor to modify a memory location that is stored in more than one cache requires bus communication in order for each copy of the corresponding line to be marked invalid or updated to reflect the new value.
Various types of cache coherency protocols can be employed to maintain cache coherency in a multiprocessor system. One type of cache coherency protocol that is commonly used is referred to as a write-through scheme. In a write-through scheme, all cache writes or updates are simultaneously written into the cache and to main memory. Other caches on the bus must monitor bus transactions and invalidate any matching entries when the memory block is written through to main memory. In a write-back scheme, a cache location is updated with the new data on a processor write hit, and main memory is generally only updated when the updated data block must be exchanged with a new data block or the updated block is requested by another processor.
Multiprocessor cache systems which employ a write-back scheme generally utilize some type of ownership protocol to maintain cache coherency. In this scheme, any copy of data in a cache must be identical to the owner of that location's data. The owner of a location's data is generally defined as the respective location having the most recent version of the data residing in the respective memory location. Ownership is generally acquired through special read and write operations defined in the ownership protocol.
A cache that owns a data entry assumes responsibility for the data's validity in the entire system. The cache that owns a particular data entry is responsible for ensuring that main memory is properly updated, if necessary, and that ownership is passed to another cache when appropriate. The owning cache is also responsible for providing the correct copy of data to other processors or devices which request the data during a read cycle. When a cache owns a data entry and snoops a read cycle generated by another device or processor requesting the data, the cache must inhibit the system memory from providing the data and provide the correct copy of data to the requesting device.
As previously mentioned, the cache controller includes a directory that holds an associated tag entry for each data entry or set in the cache. Each tag entry is comprised of a tag and a number of tag state bits. The tag state bits determine the status of the data in the respective set of the cache, i.e. whether the data is invalid, owned, or shared with another cache, etc. The cache controller also includes a snooping mechanism which monitors or snoops the host bus when other processors or devices are using the host bus. The cache controller maintains cache coherency by updating the tag state bits for a data entry whenever the cache obtains or relinquishes ownership and in certain instances when the snooping mechanism detects a read or write cycle for data stored in the cache.
Most cache systems can cache the majority of computer system addresses. However, a portion of the address space is generally deemed non-cacheable for various reasons. For example, the input/output (I/O) address space is generally designated as non-cacheable to prevent I/O addresses from being placed in the cache. This is because data stored in I/O addresses is subject to frequent change or may control physical devices. In addition, in most computer systems the I/O bus is separate from the host or processor bus, and therefore data stored in I/O addresses are subject to change without any activity occurring on the host bus. Thus, data stored in these I/O addresses can change without being detected by the cache controllers situated on the host bus. In addition, video memory located on the I/O bus is generally designated as non-cacheable due to its dual ported nature.
In many computer systems, there is generally some system memory situated on an I/O bus which can be designated as cacheable. For example, if the system read only memory (ROM) is situated on the I/O bus, then this memory can be designated as cacheable. However, it should be noted that if system ROM is designated as cacheable, this memory must also be write-protected in the cache to prevent the processor from inadvertently changing ROM data stored in the cache. In addition, if the I/O bus includes slots for expansion memory, then memory placed in these slots can be designated as cacheable. However, a problem arises in a multiprocessor write-back cache architecture where system memory situated on the I/O bus is designated as cacheable if the I/O bus cannot recognize inhibit cycles. As previously mentioned, in a write-back cache architecture if a cache controller snoops a read hit to an owned or modified location, the cache must inhibit the current memory cycle by the memory controller or memory device and provide the data to the requesting device. However, if the I/O bus does not recognize inhibit cycles, the cache which owns the data will not be able to inhibit the system memory on the I/O bus from returning incorrect or "dirty" data to the requesting device, resulting in cache coherency problems. In addition, the system memory on the I/O bus and the cache controller may both attempt to return data to the requesting device, resulting in bus contention problems.
Therefore, in multiprocessor, write-back cache architectures which include an I/O bus that does not recognize inhibit cycles, it is generally not possible to cache the system memory situated on the I/O bus. However, if the system memory situated on the I/O bus could be designated as write-through in each of the caches, then no problems would result since there would never be a need for an inhibit cycle for this memory. No inhibit cycles would be needed if a write-through protocol was used since the system memory, not the cache, would always be the owner of data it contains. Therefore, a method is needed which enables a cache controller to designate memory address spaces as write-through or write-backable in a computer system. In addition, this method must be easily accessed by the cache controller to prevent a cache write policy determination from degrading system performance.
Background on the Extended Industry Standard Architecture (EISA) is deemed appropriate. EISA is a superset of the Industry Standard Architecture (ISA), a bus architecture introduced in the International Business Machines (IBM) PC AT personal computer. The EISA bus is built around an EISA chip set which includes an EISA bus controller (EBC) chip, among others. The EBC acts as an interface between a host or processor bus and the EISA (I/O) bus. Preferably the 82358 from Intel Corporation is used as the EBC. The 82358 does not include an inhibit input, and therefore the EISA bus does not recognize inhibit cycles.