1. Field of the Invention
The present invention generally relates to cache memories, and more particularly to an asynchronous input/output cache memory having reduced latency in both the system frequency domain and the input/output (I/O) frequency domain.
2. Discussion of the Related Art
The driving force behind computer system innovation has been the demand for faster and more powerful computers. A major bottleneck in computer speed has historically been the speed with which data can be accessed from memory, referred to as the memory access time. The microprocessor, with its relatively fast processor cycle times, has generally been delayed by the use of wait states during memory accesses to account for the relatively slow memory access times. Therefore, improvement in memory access times has been one of the major areas of research in enhancing computer performance.
In order to bridge the gap between fast processor cycle times and slow memory access times, cache memory was developed. A cache memory is a small amount of very fast, and expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main memory. The microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during memory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read hit takes place, and the data from the memory access can be returned to the processor from the cache without incurring wait states. If the data is not in the cache, then a cache read miss takes place. In a cache read miss, the memory request is forwarded to the system, and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor.
An efficient cache yields a high “hit rate” which is the percentage of cache hits that occur during all memory accesses. When a cache has a high hit rate, the majority of memory accesses are serviced with zero wait states. The net effect of a high cache hit rate is that the wait states incurred on a relatively infrequent miss are averaged over a large number of zero wait state cache hit accesses, resulting in an average of nearly zero wait states per access. Also, since a cache is usually located on the local bus of the microprocessor, cache hits are serviced locally without requiring use of the system bus. Therefore, a processor operating out of its local cache has a much lower “bus utilization.” This reduces system bus bandwidth used by the processor, making more bandwidth available for other devices, such as intelligent bus masters, which can independently gain access to the bus.
Although processor caches are perhaps the best known, other caches are known and used as well. For example, I/O caches are known for buffering and caching data between a system bus and an I/O bus. As will be further described below, certain system components, like a microprocessor and memory, are synchronized off a different clock than I/O transactions. When passing data between two differing frequency domains, it is usually desirable, if not necessary, to buffer the data in some way. One way that this is done is by passing the data through an I/O cache.
Whether it be a processor cache, an I/O cache, or some other type of cache memory, important considerations in cache performance are the organization of the cache and the cache management policies that are employed in the cache. A cache can generally be organized into either a direct-mapped or set-associative configuration. In a direct-mapped organization, the physical address space of the computer is conceptually divided up into a number of equal pages, with the page size equaling the size of the cache. The cache is partitioned into a number of sets, with each set having a certain number of lines. The line size is generally a plurality of bytes or words. Each of the conceptual pages in main memory has a number of lines equivalent to the number of lines in the cache, and each line from a respective page in main memory corresponds to a similarly located line in the cache. An important characteristic of a direct-mapped cache is that each memory line from a conceptual page in main memory, referred to as a page offset, can only reside in the equivalently located line or page offset in the cache. Due to this restriction, the cache only need refer to a certain number of the upper address bits of a memory address, referred to as a tag, to determine if a copy of the data from the respective memory address resides in the cache because the lower order address bits are pre-determined by the page offset of the memory address.
Cache management is generally performed by a device referred to as a cache controller. One cache management duty performed by the cache controller is the handling of processor writes to memory. The manner in which write operations are handled determines whether a cache is designated as “write-through” or “write-back.” When the processor initiates a write to main memory, the cache is first checked to determine if a copy of the data from this location resides in the cache. If a processor write hit occurs in a write-back cache design, then the cache location is updated with the new data, and main memory is only updated later if this data is requested by another device, such as a bus master. Alternatively, the cache maintains the correct or “clean” copy of data thereafter, and the main memory is only updated when a flush operation occurs. In a write-through cache, the main memory location is generally updated in conjunction with the cache location on a processor write hit. If a processor write miss occurs to a write-through cache, the cache controller may either ignore the write miss or may perform a “write-allocate,” whereby the cache controller allocates a new line in the cache in addition to passing the data to the main memory In a write-back cache design, the cache controller generally allocates a new line in the cache when a processor write miss occurs. This generally involves reading the remaining entries from main memory to fill the line in addition to allocating the new write data.
The cache controller includes a directory that holds an associated entry for each set in the cache. In a write-through cache, this entry generally has three components: a tag, a tag valid bit, and a number of line valid bits equaling the number of lines in each cache set. The tag acts as a main memory page number, and it holds the upper address bits of the particular page in main memory from which the copy of data residing in the respective set of the cache originated. The status of the tag valid bit determines whether the data in the respective set of the cache is considered valid or invalid. If the tag valid bit is clear, then the entire set is considered invalid. If the tag valid bit is true, then an individual line within the set is considered valid or invalid depending on the status of its respective line valid bit. In a write-back cache, the entries in the cache directory are generally comprised of a tag and a number of tag state bits for each of the lines in each set. As before, the tag comprises the upper address bits of the particular page in main memory from which the copy originated. The tag state bits determine the status of the data for each respective line, i.e., whether the data is invalid, modified (owned), or clean.
A principal cache management policy is the preservation of cache coherency. Cache coherency refers to the requirement that any copy of data in a cache must be identical to (or actually be) the owner of that location's data. The owner of a location's data is generally defined as the respective location having the most recent or the correct version of data. The owner of data is generally either an unmodified location in main memory, or a modified location in a write-back cache.
In computer systems where independent bus masters can access memory, there is a possibility that a bus master, such as a direct memory access controller, network or disk interface card, or video graphics card, might alter the contents of a main memory location that is duplicated in the cache. When this occurs, the cache is said to hold “stale,” “dirty” or invalid data. Also, when the processor executes a cache write hit operation to a write-back cache, the cache receives the new data, but main memory is not updated until a later time, if at all. In this instance, the cache contains a “clean” or correct version of the data and is said to own the location, and main memory holds invalid or “dirty” data. Problems would arise if the processor was allowed to access dirty data from the cache, or if a bus master was allowed to access dirty data from main memory. Therefore, in order to maintain cache coherency, i.e., in order to prevent a device such as a processor or bus master from inadvertently receiving incorrect or dirty data, it is necessary for the cache controller to monitor the system bus for bus master accesses to main memory when the processor does not control the system bus. This method of monitoring the bus is referred to as snooping.
In a write-back cache design, the cache controller must monitor the system bus during memory reads by a bus master because of the possibility that the cache may own the location, i.e., the cache may contain the only correct copy of data for this location, referred to as modified data. This is referred to as read snooping. On a read snoop hit where the cache contains modified data, the cache controller generally provides the respective data to main memory, and the requesting bus master generally reads this data en route from the cache controller to main memory, this operation being referred to as snarfing. Alternatively, the cache controller provides the respective data directly to the bus master and not to main memory. In this alternative scheme, the main memory would perpetually contain erroneous or “dirty” data until a cache flush occurred.
In both write-back and write-through cache designs, the cache controller must also monitor the system bus during bus master writes to memory because the bus master may write to or alter a memory location having data that resides in the cache. This is referred to as write snooping. On a write snoop hit to a write-through cache, the cache entry is generally marked invalid in the cache directory by the cache controller, signifying that this entry is no longer correct. In a write-back cache, the cache is updated along with main memory, and the tag states bits are set to indicate that the respective cache location now includes a clean copy of the data. Alternatively, a write-back cache may invalidate the entire line on a snoop write hit. Therefore, in a write-back cache design, the cache controller must snoop both bus master reads and writes to main memory. In a write-through cache design, the cache controller need only snoop bus master writes to main memory.
The process of snooping generally entails that the cache controller latch the system bus address and perform a cache look-up in the tag directory corresponding to the page offset location where the memory access occurred to see if a copy of data from the main memory location being accessed also resides in the cache. If a copy of the data from this location does reside in the cache, then the cache controller takes the appropriate action depending on whether a write-back or write-through cache design has been implemented, or whether a read or write snoop hit has occurred. This prevents incompatible data from being stored in main memory and the cache, thereby preserving cache coherency.
Another problem that occurs where cache systems are utilized is that, when the respective processor is not in control of the system bus, the cache must be able to both service local requests from the processor and snoop the system bus for memory accesses by other devices. Latency problems can arise where the processor is operating out of the cache and a snooping operation is required due to a pending bus master memory access cycle on the system bus. If the cache is busy servicing a processor access while a bus master memory access is occurring on the bus, the processor access may not complete before the respective bus master cycle completes. If this occurs, the cache will miss a snoop cycle, thus resulting in potential erroneous data in the cache and possible erroneous operation. This condition is exacerbated when logic external to the cache controller controls cache snoop accesses to the system bus.
As previously mentioned, an I/O cache may be utilized to buffer data that is communicated between a system frequency domain and an I/O frequency domain. To illustrate the manner in which such I/O caches have been implemented in the past, reference is made to FIGS. 1A and 1B. Generally, the I/O cache 2 is designed to reside completely within either the system frequency domain (FIG. 1A) or completely within the I/O frequency domain (FIG. 1B). Devices such as system memory 4 and a CPU 6 are synchronized to a system clock operating at a first (system) frequency, while I/O devices, or communications across an I/O bus 10 occur at a second (I/O) frequency. The differing clock frequencies characterize a frequency “domain” for the relative devices.
As is known, when data or signals are passed across a frequency domain boundary 16, a delay or latency is encountered. Normally, this latency is on the order of a couple of clock cycles. By way of example, when a system is constructed as illustrated in FIG. 1A, latency delays are encountered when data from the system memory 4 is fetched from devices on the I/O bus. If a block of data is requested, the latency delays are repeatedly encountered, as byte after byte of data is retrieved from the cache 2 to the I/O bus 10. Alternatively, when a system is constructed as illustrated in FIG. 1B, latency delays are encountered on the system data bus 8 side of the system in connection with reads and writes that occur in connection with the snoopy cache coherency protocol and operations.
Accordingly, it is desired to provide an I/O cache that minimizes the latency on both the system side of the cache (i.e., latency associated with snoopy cache coherency) and the I/O side of the cache (i.e., latency associated with I/O reads).