1. Field of the Invention
This invention relates to cache-based computer architectures and, more particularly, to write allocation and data prefetch techniques employed within cache-based computer systems. The invention also relates to memory control techniques for computer systems.
2. Description of the Relevant Art
Cache-based computer architectures are typically associated with specialized bus transfer mechanisms to support efficient utilization of the cache memory. A cache memory is a high-speed memory unit interposed in the memory hierarchy between the slower system memory and the microprocessor to improve effective memory transfer rates and, accordingly, improve system performance. The name refers to the fact that the small memory unit is essentially hidden and appears transparent to the user, who is aware only of a larger system memory. The cache is usually implemented by semiconductor memory devices having speeds that are compatible with the speed of the processor, while the system memory utilizes a less costly, lower-speed technology. The cache concept anticipates the likely reuse by the microprocessor of selected data in system memory by storing a copy of the selected data in cache memory.
A cache memory includes a plurality of memory sections, wherein each section stores a block or a "line" of one or more words. Each block has associated with it an address tag that uniquely identifies which block of system memory it is a copy of. When a request originates in the processor for a new word, whether it be data or instruction, an address tag comparison is made to see if a copy of the requested word resides in a block of the cache memory. If present, the data is used directly from the cache. If the desired word is not in the cache memory, the block is retrieved from the system memory, stored in the cache memory and supplied to the processor. Since the cache is of limited size, space must often be allocated within the cache to accommodate the new information. An algorithm based on history of use is typically implemented to identify the least necessary block of data to be overwritten by the new block.
In addition to using a cache memory to retrieve data, the processor may also write data directly to the cache memory instead of to the system memory. When the processor desires to write data to memory, a cache memory controller makes an address tag comparison to see if the data block into which data is to be written resides in the cache memory. If the data block is present in the cache memory, the data is written directly into the block. This event is referred to as a cache write "hit". As will be explained in greater detail below, in many systems a data "dirty bit" for the data block is then set. The dirty bit indicates that data stored within the data block is dirty (i.e., has been modified), and thus, before the data block is deleted from the cache memory or overwritten, the modified data must be written into system memory.
If the data block into which data is to be written does not exist in the cache memory, the data block is either fetched into the cache memory from system memory or the data is written directly into the system memory. This event is referred to as a cache write "miss". A data block which is overwritten or copied out of the cache memory when new data is stored in the cache memory is referred to as a victim block or a victim line.
Cache memories can be optimized according to a number of different features. One important feature which affects cache performance and design complexity is the handling of writes by the processor or an alternate bus master. As explained above, because two copies of a particular piece of data or instruction code can exist, one in system memory and a duplicate copy in the cache, writes to either system memory or the cache memory can result in an incoherence between the two storage units. For example, suppose specific data is stored at a predetermined address in both the cache memory and system memory. If a cache write hit to the predetermined address occurs, the processor proceeds to write the new data into the cache memory at the predetermined address. Since the data is modified in the cache memory but not in system memory, the cache memory and system memory become incoherent. Similarly, in systems with an alternate bus master, direct memory access (DMA) write cycles to system memory by the alternate bus master modify data in system memory but not in the cache memory. Again, the cache memory and system memory become incoherent.
An incoherence between the cache memory and system memory during processor writes can be handled by implementing one of several commonly employed techniques. In a first technique, a "write-through" cache guarantees consistency between the cache memory and system memory by writing to both the cache memory and system memory during processor writes. The contents of the cache memory and system memory are always identical, and so the two storage systems are always coherent. In a second technique, a "write-back" cache handles processor writes by writing only to the cache memory and setting a "dirty" bit to indicate cache entries which have been altered by the processor. When "dirty" or altered, cache entries are later replaced during a "cache replacement" cycle, the modified data is written back into system memory.
Depending upon which cache architecture is implemented, incoherency between the cache memory and system memory during a DMA read operation can be handled with bus watch or "snooping" techniques, with instructions executed by the operating system, or with combinations thereof. In a "write-through" cache, no special techniques are required during the DMA read operation. In a "write-back" cache, bus snooping can be employed to check the contents of the cache for altered data and sourcing data from the cache to the requesting bus master when appropriate to maintain coherency. When the cache memory is sourcing data to the requesting bus master, system memory is prohibited from supplying data to the requesting bus master. Alternatively, the operating system can execute an instruction to write "dirty" data from the cache memory into system memory prior to the DMA read operation. All "dirty" data is written out to system memory, thereby ensuring consistency between the cache memory and system memory.
Similarly during a DMA write operation, incoherency between the cache memory and system memory can be handled with bus "snooping" or monitoring, with instructions executed by the operating system, or with combinations thereof. In a "write-through" and a "write-back" cache, bus snooping invalidates cache entries which become "stale" or inconsistent with system memory following the DMA write operation. Alternatively, cache PUSH and INVALIDATE instructions can be executed by the operating system prior to the DMA write operation to write "dirty" or altered data out to system memory and to invalidate the contents of the entire cache. Since only a single copy of data exists in system memory following the instructions, the DMA write to system memory will not present the problem of possibly "stale" data in the cache.
Another important feature that affects cache performance and design complexity is data prefetching. Data prefetch techniques are used to enhance the probability of cache hit occurrences and are typically executed when a miss is encountered on an access to cache memory. Such prefetch techniques involve the transfer of a block or line of several words from system memory into the cache memory even though the immediate need is for only one word. If the required word is part of a stream of sequential instructions, it is likely that subsequent instructions will be retrieved with the required first word, making repeated access to system memory unnecessary.
When data prefetching is used in conjunction with a read operation, the system memory is controlled to provide a block of several words in address sequence at a high data rate. The requested word residing within the block is provided to the microprocessor while the entire block of words is stored in the cache memory. The transfer of the block of words is typically accomplished by burst memory access. During the data phase of a burst memory access cycle, a new word is provided to the system bus from system memory for several successive clock cycles without intervening address phases. The fastest burst cycle (no wait state) requires two clock cycles for the first word (one clock for the address, one clock for the corresponding word), with subsequent words returned from sequential addresses on every subsequent clock cycle. For systems based on the particularly popular model 80486 microprocessor, a total of four "doublewords" are transferred for a given burst cycle.
Prefetch techniques are similar when used in conjunction with write operations. FIG. 1 is a timing diagram that illustrates various signals associated with a prefetch technique that may be employed within a computer system based on the model 80486 microprocessor. The timing diagram of FIG. 1 represents a time duration encompassed by bus states 100-110.
The initial write request by the microprocessor occurs during bus state 101 when the address strobe signal CADS is asserted and the cycle definition control bus signals CC3:0! are driven to indicate that the present cycle is a write cycle. If it is determined that the write operation is a cache miss cycle, address strobe signal MADS is asserted on the system bus by a cache controller during bus state 102. The addressing signal MA is also driven on the system bus, as well as cycle definition control signals MC3:0!. At bus state 104, the write data MD is driven on the system bus and is stored within the system memory. At the same time, signals MBRDY and MBLAST are asserted by a memory controller to indicate that the write data has been accepted and that the cycle has completed.
The cache controller then issues a burst read request during bus state 105 to prefetch the line of data that corresponds to the address of the written data. This is accomplished by reasserting the system bus address strobe signal MADS and driving the system bus cycle definition signals MC3:0! to indicate that the present cycle is a burst read request. It is noted that this burst read request is initiated independently of the microprocessor.
The first word of the prefetched data becomes available during bus state 109 and is written into the cache memory. During bus state 110 and subsequent bus states (not shown), the three remaining words of the fetched line are provided to the system bus and written into the cache memory. Since it is likely that the prefetched words will be accessed during subsequent cycles of the processor, repeated access to system memory may be unnecessary and overall processing speed may be increased.
Although data prefetching techniques have been quite successful in enhancing overall cache-hit occurrences in computer systems, such techniques can in some circumstances negatively impact system performance. One such circumstance may arise when the prefetching technique described in conjunction with FIG. 1 is employed. As illustrated by the timing diagram, the system bus is "busy" or occupied with signals for the entire time beginning with bus state 102 and ending with the second bus state that follows bus state 110. This limits the bandwidth of the system bus since subsequent system bus cycles initiated by the microprocessor or alternate bus master must be postponed until after the completion of the address phase and/or data phase of the prefetch cycle. This can degrade system performance.
In an attempt to increase the bandwidth of the system bus, an alternative prefetch technique may be employed. In this alternative technique, if the microprocessor initiates a write cycle and a cache miss occurs, the word is written directly into an allocated block of the cache memory (rather than into system memory). The cache controller concurrently initiates a burst read request to prefetch and store the remaining words of the block from system memory into the cache memory. Although this technique decreases the duration of signal activity on the system bus (since a write cycle to system memory is unnecessary), an incoherency arises since the word written to cache memory was not updated in system memory. The corresponding block of cache memory must accordingly be marked as "dirty".
The existence of dirty data within the cache memory can degrade system performance when direct memory access (DMA) transfers are performed and when cache replacement cycles are necessary. For example, as explained previously, during a DMA read operation, the existence of dirty data in the cache memory may necessitate the initiation of a "snoop" cycle to source the data from the cache memory rather than system memory. The initiation of such a snoop cycle often increases the overall time required to complete multi-word transfers, thus, degrading system performance. Similarly, during a cache replacement cycle when a new block of data is stored within the cache memory, the victim block must be transferred into system memory if it contains dirty data. The requirement of such a transfer can also degrade system performance.