Conventional computer systems employ a central processing unit ("CPU") for performing logical operations on digital data elements, and main memory for storing digital data elements. An operation inherent in such systems is the transfer of data elements between the CPU and main memory. The time required to transfer the data often becomes a dominant factor in the processing performance of the computer system. The transfer time can be influenced by limitations on the transfer bus size and speed, the physical location of the CPU and main memory, or the size or access time of main memory.
Cache memory systems are employed in many computer systems to decrease the impact of main memory transfers on system performance. A cache memory is usually smaller than the main memory and is located near the CPU. A cache memory stores recently-accessed data which has a high probability of being subsequently accessed by the CPU.
Two general categories of cache memories are known in the art. Caches in the first category are designated as store-thru or write-thru. Data written to a store-thru cache is also simultaneously written to main memory. Thus, cached data in a store-thru cache will always match the data at a corresponding address in main memory. On a subsequent CPU read from the same address, the data is available in the cache and a main memory transfer is not required.
Caches in the second category are designated as store-in or write-back. Data is written to a store-in cache without simultaneously updating main memory. Thus, cached data in the cache and data at a corresponding address in main memory may be different. Write-back cycles are thus periodically performed which write the cached data to main memory so that the main memory contains updated data.
The main memory of the computer system may often be shared among multiple processing resources in addition to the CPU. For example, external input/output ("I/O") devices may be employed to perform a variety of I/O: functions including magnetic media data storage, printing, etc., or additional general processing units may be employed. Any such additional processing resource is referred to herein as a secondary processor. If a secondary processor requires read or write access to main memory and a store-in cache is employed near the central processor (CPU), the contents of the cache and the contents of main memory may be different if a write-back cycle has not yet been performed. This difference is generally known to the system and can be reflected in the logic of the cache which maintains an update indicator or a "dirty bit". The dirty bit indicates that data has been written to the cache but has not yet been written to main memory in a write-back cycle. When a secondary processor requires read or write access to main memory, a check or "snoop" cycle is performed to determine whether dirty data is held at the relevant address in the cache. If the check cycle reveals that the data is dirty, or updated with respect to corresponding data in main memory, the data in the cache must be written back to main memory using a write-back cycle before the secondary processor can access the data in main memory.
Access to main memory from a secondary processor therefore requires (1) a conventional read or write request; (2) a snoop cycle; (3) a possible write-back cycle; and (4) the eventual main memory access by the secondary processor. During the snoop cycle and the possible write-back cycle, the secondary processor is idle, i.e., waiting for clean snoop results, or dirty results and a write-back. This idle time can lead to significant performance degradation. The performance degradation can be particularly serious if the store-in cache is designed such that a write-back cycle encompasses all bytes of a single cache block or cache line. For example, if the secondary processor requires access to only 4 bytes within a 32-byte block, the processor must wait for a snoop and a full write-back of 32 bytes before accessing the 4 bytes. Thus, for a simple 4-byte read access from main memory, the secondary processor incurs the idle time associated with (1) a snoop cycle and (2) a write-back of a 32-byte cache line.
An alternative approach to main memory access is thus required in which a secondary processor does not incur the performance degradation associated with a store-in cache near the central processor.