The present invention relates generally to the field of microprocessors, computers and computer systems. More particularly, the present invention relates to snoop stall reduction on a microprocessor external bus.
Since the beginning of electronic computing, main memory access has been much slower than processor cycle times. Access time is the time between when a read is initially requested and when the desired data arrives. Processor cycle time and memory access time continues to widen with advances in semiconductor technology. Efficient mechanisms to bridge this gap are central to achieving high performance in future computer systems.
The conventional approach to bridging the gap between memory access time and processor cycle time has been to introduce a high-speed memory buffer, commonly known as a cache, between the microprocessor and main memory. Caches are ubiquitous in virtually every class of general purpose computer systems. The data stored within one cache memory is often shared among the various processors or agents which form the computer system. The main purpose of a cache memory is to provide fast access time while reducing bus and memory traffic. A cache achieves this goal by taking advantage of the principles of spatial and temporal locality.
As semiconductor technology has continued to improve, the gap between memory access time and central processing unit (CPU) cycle time has widened to the extent that there had arisen a need for a memory hierarchy which includes two or more intermediate cache levels. For example, a two-level cache memory hierarchy often provides an adequate bridge between access time and CPU cycle time such that memory latency is dramatically reduced. In these types of computer systems, the first-level (L1) cache or the highest level cache provides fast, local access to data since this cache is situated closest to the execution unit and has the smallest size. The second-level (L2) cache provides good data retention in bus and memory traffic because this cache is comparatively larger in size. The second level (L2) cache therefore takes up significant die size area and is consequently slower than the first level (L1) cache.
Main memory is typically the last or final level down in the memory hierarchy. Main memory satisfies the demands of caches and vector units, and often serves as the interface for one or more peripheral devices. Main memory usually comprises of core memory or a dedicated data storage device such as a hard disk drive unit.
One of the problems that arises in computer systems that include a plurality of caching agents and a shared data cache memory hierarchy is cache coherency. Cache coherency refers to the problem wherein, due to the use of multiple or multi-level cache memories, data may be stored in more than one location in memory. For example, if a microprocessor is the only device in a computer system that operates on data stored in memory and the cache is situated between the CPU and memory, there is little risk in the CPU using stale data. However, if other agents in the system share storage locations in the memory hierarchy, it creates an opportunity for copies of data to be inconsistent, or for other agents to read stale copies.
Cache coherency is especially problematic in computer systems that employ multiple processors as well as other caching agents. For instance, a program running on multiple processors requires that copies of the same data be located in several cache memories. Thus, the overall performance of the computer system depends upon the ability to share data in a coherent manner.