1. Field of the Invention
This invention relates to cache memory, and more particularly to an improved architecture and method for snooping accesses to cacheable memory initiated by an alternate bus master.
2. Description of the Relevant Art
A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system between a relatively slow system memory and a central processing unit (CPU) to improve effective memory transfer rates and accordingly improve system performance. The name refers to the fact that the small cache memory unit is essentially hidden and appears transparent to the user, who is aware only of a larger system memory. The cache is usually implemented by semiconductor memory devices, such as static RAMs, having speeds that are comparable to the speed of the processor while the system memory utilizes a less costly, lower-speed devices, such as dynamic RAMs. The cache concept anticipates the likely reuse by the microprocessor of selected data in system memory by storing a copy of the selected data in the cache memory.
A cache memory typically includes a plurality of memory sections, wherein each memory section stores a block or a "line" of two or more words of data. A 256 Kbyte memory array cache, for example, could be divided into 8192 lines where each line consists of 32 8-bit bytes. Each line has associated with it an address tag.
When a read request originates in the processor for a new word (or a new doubleword or a new byte), whether it be data or instruction, an address tag comparison is made to determine whether a copy of the requested word resides in a line of the cache memory. If present, the data is used directly from the cache. This event is referred to as a cache read "hit". If not present, a line containing the requested word is retrieved from system memory and stored in the cache memory. The requested word is simultaneously supplied to the processor. This event is referred to as a cache read "miss".
In addition to using a cache memory to retrieve data, the processor may also write data directly to the cache memory instead of to the system memory. When the processor desires to write data to memory, an address tag comparison is made to determine whether the line into which data is to be written resides in the cache memory. If the line is present in the cache memory, the data is written directly into the line. This event is referred to as a cache write "hit". As will be explained in greater detail below, in many systems a data "dirty bit" for the line is then set. The dirty bit indicates that data stored within the line has been modified. Before a line containing dirty or modified data is deleted from the cache memory or overwritten, the modified data must be written into system memory.
If the line into which data is to be written does not exist in the cache memory, the line is either fetched into the cache memory from system memory to allow the data to be written into the cache, or the data is written directly into the system memory. This event is referred to as a cache write "miss". A line which is overwritten or copied out of the cache memory when new data is stored in the cache memory is referred to as a victim block or a victim line.
Cache memories can be optimized according to a number of different techniques. One aspect that affects system performance and design complexity is the handling of writes initiated by the processor or by an alternate bus master. As explained previously, because two copies of a particular piece of data or instruction code can exist, one in system memory and a duplicate copy in the cache memory, writes to either the system memory or the cache memory can result in an incoherence between the two storage units. For example, consider the case in which the same data is initially stored at a predetermined address in both the cache memory and the system memory. If the processor subsequently initiates a write cycle to store a new data item at the predetermined address, a cache write "hit" occurs and the processor proceeds to write the new data into the cache memory at the predetermined address. Since the data is modified in the cache memory but not in system memory, the cache memory and system memory become incoherent. Similarly, in systems with an alternate bus master, write cycles to system memory by the alternate bus master modify data in system memory but not in the cache memory. Again, the cache memory and system memory become incoherent.
An incoherence between the cache memory and system memory during processor writes can be prevented or handled by implementing one of several commonly employed techniques. In a first technique, a "write-through" cache guarantees consistency between the cache memory and system memory by writing the same data to both the cache memory and system memory. The contents of the cache memory and system memory are always identical, and thus the two storage systems are always coherent. In a second technique, a "write-back" cache handles processor writes by writing only to the cache memory and setting a "dirty" bit to indicate cache entries which have been altered by the processor. When "dirty" or altered cache entries are later replaced during a "cache replacement" cycle, the modified data is written back into system memory.
An incoherence between the cache memory and system memory during a write operation by an alternate bus master is handled somewhat differently. For a system that employs write-back caching, one of a variety of bus monitoring or "snooping" techniques may be implemented to determine whether certain lines of data within the cache memory should be invalidated or written-back to system memory when the alternate bus master attempts to write data to system memory. For one such technique as specified by the particularly popular "MESI" protocol, when an alternate bus master attempts to write data to a system memory address, a cache controller determines whether a line of data corresponding to the system memory address is contained within the cache memory. If a corresponding line is not contained by the cache memory, no additional action is taken by the cache controller, and the write cycle initiated by the alternate bus master is allowed to complete. If, on the other hand, a corresponding line of data is contained in the cache memory, the cache controller determines whether that line of data is dirty or clean. If the line is clean, the line is marked invalid by the cache controller and the transfer of data from the alternate bus master into system memory is allowed to complete. The line of data must be marked invalid since the modified (and thus the most up-to-date) data is now contained only within the system memory (following completion of the write cycle by the alternate bus master). If the line of data is instead dirty, a snoop write-back cycle is initiated by the cache controller which causes the alternate bus master to "back-off" and release mastership of the system bus. The cache controller then causes the entire line of dirty data within the cache memory to be written back into system memory. The snoop write-back cycle may be accomplished by executing a burst write cycle to system memory. As is well known to those of skill in the art, during the data phase of a burst cycle, a new word (or doubleword) may be written to the system memory for each of several successive clock cycles without intervening address phases. The fastest burst cycle (no wait states) requires two clock cycles for the first word (one clock for the address, one clock for the corresponding word), with subsequent words returned from sequential addresses on every subsequent clock cycle.
After the snoop write-back cycle completes, the alternate bus master re-obtains mastership of the system bus, and the write cycle by the alternate bus master is again executed. At this point, the new data is allowed to be written into the system memory. It is noted that the snoop write-back cycle ensures that data coherency is maintained even if the writing of data from the alternate bus master does not involve an entire cache line.
An incoherence between the cache memory and the system memory during a read operation by an alternate bus master is treated similarly. When an alternate bus master attempts to read data from system memory, the cache controller determines whether a corresponding line of data is contained within the cache memory. If a corresponding line is contained by the cache memory, and if the corresponding line is dirty, a snoop write-back cycle is initiated by the cache controller which causes the alternate bus master to back-off and release mastership of the system bus. The cache controller then causes the entire line of dirty data within the cache memory to be written back into system memory. After the write-back cycle completes, the alternate bus master re-obtains mastership of the system bus, and the read cycle by the alternate bus master is re-initiated. At this point, the data within the system memory is allowed to be read.
FIGS. 1 and 2 depict two common topologies used to incorporate a cache memory into a typical microprocessor based computer. In the "look-aside" cache implementation shown in FIG. 1, central processing unit (CPU) 12 is connected to cache controller 14, system memory 16, and bus bridge 18 over system bus 20. This configuration minimizes CPU-to-system memory access time when a cache miss occurs because the address arrives at cache controller 14 and system memory 16 simultaneously on the common bus. A performance penalty is imposed by the look-aside configuration, however, because CPU 12 must obtain mastership of system bus 20 to perform cache accesses. During the time that 12 has mastership of system bus 20, system memory 16 is unavailable to alternate bus master 24. Thus, bus master 24 cannot access system memory 16 while CPU 12 is performing cache accesses.
FIG. 2 shows the "look-through" cache topology. In the look-through cache system, the CPU-to-cache path is isolated from system bus 20 so that system memory 16 remains accessible by bus bridge 18 and bus master 24 during cache accesses by CPU 12. This performance gain is countered by the penalty resulting from the longer CPU-to-system memory path. With cache controller 14 placed between CPU 12 and system memory 16, more time is required to propagate a memory address from CPU 12 to system memory 16 when a cache miss occurs.
The conventional look aside and look through cache memory architectures can result in inefficient utilization of system bus 20 because bus bridge 18 is required to pass all cacheable memory accesses initiated on local bus 22 to cache controller 14 over system bus 20. This requirement prevents CPU 12 from accessing system bus 20 whenever cache controller 14 is snooping an access to cacheable memory initiated by alternate bus master 24. Stated similarly, when alternate bus master 24 initiates a memory access, bus bridge 18 acquires mastership of system bus 20 and places the memory address on the bus where the address can be snooped by cache controller 14. CPU 12 is prevented from performing operations requiring mastership of system bus 20 while bus bridge 18 has mastership of the bus. The CPU's inability to execute operations requiring system bus 20 during this time can slow the overall operation of the system.
A second component of performance degradation resulting from conventional cache architecture arises during a cache miss that requires a snoop write back cycle. When this condition occurs, the system must back alternate bus master 24 (or, more precisely, bus bridge 18, which is connected to alternate bus master 24) off system bus 20 before the cache subsystem can initiate a write back. The back off cycle can slow system performance by delaying the time required to complete the write back cycle.
A third component of degradation caused by the conventional cache architecture is the propagation delay across bus bridge 18. When a system memory access originates from alternate bus master 24 located on local bus 22, the address must propagate through the bridge logic before it arrives on system bus 20 where it can be snooped by cache controller 14. This propagation delay can lengthen local bus cycles and degrade system performance.
Finally, the conventional cache architecture greatly restricts the ability to place cacheable memory on local bus 22. Suppose, for example, that a second system memory is connected to the local bus and that an alternate bus master is trying to access an address within the second system memory. It is desirable that CPU accesses to the second system memory be cacheable. It is also desirable that alternate bus masters residing on the local bus be able to access the second system memory without acquiring mastership of the system bus. With the look-aside and look-through topologies, however, both of these desirable goals cannot typically be simultaneously accommodated. If the second system memory is made cacheable, accesses to the second system memory by an alternate bus master will necessarily require mastership of the system bus so that a coherency check to the cache can be performed. Any benefit derived from isolating the second system memory from the system bus is essentially lost if all accesses to the second system memory require mastership of the system bus.