This section is intended to introduce the reader to various aspects of art which may be related to various aspects of the present invention which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
With the advent of standardized architectures and operating systems, computers have become virtually indispensable for a wide variety of uses from business applications to home computing. Whether a computer system is a personal computer or a network of computers connected via a server interface, computers today rely on processors, associated chip sets, and memory chips to perform most of the processing functions, including the processing of system requests. The more complex the system architecture, the more difficult it becomes to process requests in the system efficiently. Despite the increasing complexity of system architectures, demands for improved request processing speed continue to drive system design. Designers are often challenged with finding ways to reduce the cycle time for accessing data and processing requests.
Some systems include multiple processing units or microprocessors connected via a processor bus. By implementing multiple processors, system processing efficiency is improved by providing a system that is able to simultaneously process requests. To coordinate the exchange of information among the processors, a host/data controller is generally provided. The host/data controller is further tasked with coordinating the exchange of information between the plurality of processors and the system memory. The host/data controller may be responsible not only for the exchange of information in the typical Read-Only Memory (ROM) and the Random Access Memory (RAM), but also the cache memory in high speed systems. Cache memory is a special high speed storage mechanism which may be provided as a reserved section of the main memory or as an independent high-speed storage device. Essentially, the cache memory is a portion of the RAM which is typically made of high speed static RAM (SRAM) rather than the slower and cheaper dynamic RAM (DRAM) which may be used for the remainder of the main memory. Alternatively, each processor may have an associated cache memory. By storing frequently accessed data in the cache memory, the processor avoids having to re-access the shared memory each time the information is needed.
For multiprocessor and multibus shared memory systems, bus sniffing or bus snooping may be implemented to maintain system memory coherency. For bus sniffing/bus snooping techniques, an algorithm or apparatus should be designed to promote data changes by any agent to any other agent demand request. That is to say that in order to maintain coherency, each time a processor issues a request for memory data, the other processor caches may need to be searched for copies of that data, depending on the type of request, to insure that only the most up to date information is used. Some aspects of this apparatus is provided by the processor architecture. For example, X86 architecture maintains coherency across different levels of processor cache. The X86 architecture front side bus definition also deploys a self snooping protocol for agents that share the same bus. If more than one bus segment is supported in the system, a system level solution should be implemented to maintain coherency across a multitude of bus segments. Snoop filters or tag caches are a common solution for coordinating system level coherency across multiple bus segments. One of the primary goals of an efficient snoop filter design is to minimize the number of unnecessary snoops to preserve front side bus bandwidth for request and data traffic. This includes request snoops required to retrieve the most recent data or provide an agent exclusive access to data and to “castout” snoops required to make space in the tag cache for a forced inclusion snoop filter.
A typical snoop filter is implemented using a direct mapped policy where the tag cache can track only one tag at a given tag index. Each time a request accesses a particular tag index and the tag differs from the current tag at the index, the snoop filter runs a castout cycle using the current tag, to make room for the new tag. The castout cycle runs a back invalidation to the processor bus(es) being tracked by the snoop filter. If the processor needs the evicted cacheline again, it is forced to fetch the cache line from memory instead of its own cache. This results in a performance penalty, as the latency to an internal cache running at core clock speed compared to the latency to the main memory running at system bus clock speed can be an order of magnitude in difference.
The present invention may address one or more of the problems set forth above.