The speed and performance of microprocessors are continuously being enhanced and have generally outpaced improvements to the speed and performance of the main memories of computer systems. As a result, a system's main memory is not always able to keep up with the demands of high-speed processors. This is especially true of multi-processor or distributed computer systems, which can provide a substantial increase in performance over traditional single processor systems by utilizing a plurality of processors to perform parallel processing. As more and higher speed processors are added to multiprocessor systems and compete for access to the main memory, memory access times for processors generally increases. Consequently, the main memory bandwidth has transformed into a significant bottleneck for high performance data processing systems.
One common technique utilized to alleviate this bottleneck is employing a memory hierarchy. For example, a three-tiered memory can be constructed from low, medium, and high speed memories. A low speed memory may be a magnetic disk for low cost bulk storage of data. A medium speed memory may be constructed from Dynamic Random Access Memory (DRAM) for use as a computer system's main memory. A high speed memory may employ Static Random Access Memory (SRAM) for use as a processor cache memory. The theory behind a memory hierarchy is to group instructions and data to be used by the system processor in the highest speed memory. Such high speed memory is typically the most expensive memory available, so economics dictate that it be relatively small.
During operation, a system processor transfers instructions and data from the system's lower speed main memory to the higher speed cache memory so that the processor can have quick access to variables of a currently executing program. Cache systems typically transfer data in blocks of data referred to as cache lines. As the processor requires additional data not contained in the cache memory, cache lines containing such data is transferred from the main memory and replaces selected cache lines in the cache memory. Various techniques or algorithms are utilized to determine what data is replaced. Since data contained in the cache memory is duplicative of data in the main memory, changes to data in one memory must be similarly changed or noted in the other memory. For example, if the data in the cache memory is modified, the corresponding data in the main memory must be similarly modified. The problem of maintaining consistency between the cache data and the main memory data is referred to as maintaining cache coherency.
One conventional technique for maintaining cache coherency, particularly in distributed systems, is a directory-based cache coherency scheme. Directory-based coherency schemes utilize a centralized tag directory to record the location and the status of cache lines as they exist throughout the system. For example, the directory records which processor caches have a copy of the data, and further records if any of the caches have an updated copy of the data. When a processor makes a cache request to the main memory for a data item, the central directory is consulted to determine where the most recent copy of the data resides. Based on this information, the most recent copy of the cache line is retrieved so that it may be provided to the requesting processor cache memory. The central tag directory is then updated to reflect the new status for that cache line. Thus, each cache line read by a processor is accompanied by a tag directory update (i.e., a write).
While directory-based cache coherency schemes assist in building scalable multi-processor systems, the updating of the tag directory associated with the nearly continuous transfers of cache lines between the main memory and cache memories wastes valuable memory bandwidth. A typical example conventional computer system tag update results in approximately a 50% loss in the bandwidth of the main memory. The loss of bandwidth associated with a tag directory update, however, varies depending on the cache line size employed by the system, the memory technology adopted, the error correction code (ECC) scheme used, and the tag directory layout in the main memory. Many computer systems, particularly distributed systems, would benefit from a directory-based cache coherency scheme that reduces main memory bandwidth loss associated with tag directory updates.