Conservation of bus bandwidth becomes a significant design consideration as microprocessor speeds increase. These faster microprocessors make increasing demands on the memory system, and multiple processors and DMA devices which share the system bus. The M68000 family of microprocessors typically utilize 90-95% of the external bus bandwidth, due to the highly efficient, pipelined internal architecture of the central processing unit (CPU). In some systems, the problem of insufficient bus bandwidth has been addressed by using caching schemes, particularly caches that accommodate large data entries (i.e. significantly larger than the bus size).
Caching schemes have been employed by computer designers to reduce access times by a CPU to main memory, and hence, increase system performance. In many computing systems, main memory consists of a large array of memory devices with speeds which are slow relative to processor speeds. During accesses to main memory, the processor is forced to insert additional wait states to accommodate the slower memory devices. System performance during memory accesses can be enhanced with a cache. Smaller in size than main memory and significantly faster, the cache provides fast local storage for data and instruction code which is frequently used by the processor. In computing systems with caches, memory operations by the processor are first transacted with the cache. The slower main memory is only accessed by the processor if the memory operation cannot be completed with the cache. In general, the processor has a high probability of fulfilling a majority of its memory operations with the cache. Consequently in computing systems which employ a cache, effective memory access times between a processor and relatively slow main memory can be reduced.
Caches can be highly optimized according to a number of different features. One important feature which affects cache performance and design complexity is the handling of writes by the processor or an alternate bus master. Since two copies of a particular piece of data or instruction code can exist, one in main memory and a duplicate in the cache, writes to either main memory or the cache can result in incoherency between the two storage systems. For example, specific data is stored at a predetermined address in both the cache and main memory. During a processor write to the predetermined address, the processor first checks the contents of the cache for the data. After locating the data in the cache, the processor proceeds to write the new data into the cache at the predetermined address. As a result, the data is modified in the cache, but not in the main memory, and therefore, the cache and main memory become incoherent. Similarly, in systems with an alternate bus master, Direct Memory Access (DMA) writes to main memory by the alternate bus master modify data in the main memory but not in the cache. Once again, the cache and main memory become incoherent.
Incoherency between the cache and main memory during processor writes can be handled using two techniques. In a first technique, a "write-through" cache guarantees consistency between the cache and main memory by writing to both the cache and the main memory during processor writes. The contents of the cache and main memory are always identical, and so the two storage systems are always coherent. In a second technique, a "write-back" or "copy back" cache handles processor writes by writing only to the cache and setting a "dirty" bit to designate the cache entries which have been altered by the processor. A subsequent attempt by the processor to access the cache, which results in a cache "miss", can cause the replacement algorithm to select the dirty cache entry for replacement, and transfer the entire dirty (altered) cache entry to the main memory. The new data is written into the cache at the location vacated by the dirty entry.
In the prior art, there are several processors which operate in the copy back mode. These processors unload dirty cache entries by a write mode transfer of the entire cache line to the main memory. Since the cache entries are significantly larger than the system bus size, the burst write of the entire cache line to the main memory uses a significant portion of the bus bandwidth. Furthermore, processors of this type do not distinguish between the "clean" or unmodified portion, and the "dirty" or modified portion of the cache line. Essentially, these processors provide only one dirty bit and one valid bit per cache line. Consequently, the dirty status of a portion of the cache entry (i.e. one longword), results in a write of the entire cache line (i.e. four longwords) to the main memory. Thus, the bus bandwidth required to maintain the cache is greater, and bus utilization is inefficient.