The technical field is computer architectures employing caches. In particular, the technical field relates to computer architectures that support snoop processing to maintain coherency among levels of caches.
In order to improve the performance of computers having a single central processing unit, computer designers have developed architectures that have many central processing units. Often, the central processing units in such multiprocessing computers are connected to each other and to the computer""s main memory over a common bus. However, the number of central processors that can be connected to a common bus is limited by the bandwidth needed to support the central processors and the total bandwidth of the common bus. One approach for reducing the bus bandwidth required by each processor in a multi-processor computer involves placing a cache between each processor and the common bus. A cache is a small, high-speed buffer memory that temporarily holds data and/or instructions from a main memory. Once data is loaded into such a local, or processor associated cache, the processor can access the data in the cache without accessing the common bus. Typically, when a processor obtains data from its cache, less data is transmitted over the limited bandwidth of the common bus.
As a result of, and in addition to reducing common bus bandwidth requirements, the use of a cache shortens the time necessary to access memory, either for data or instruction fetch. The information located in the cache may be accessed in much less time than information located in the main memory. Thus, a processor with a cache needs to spend far less time waiting for instructions and operands to be fetched and/or stored.
A cache is made up of many cache lines of one or more words of data. Each cache line has associated with it an address tag that uniquely identifies the line of main memory from which the cache line is copied. Each time the processor makes a memory reference, an address tag comparison is made to see if a copy of the requested line resides in the cache. If the desired line is not in the cache, a xe2x80x9ccache missxe2x80x9d occurs. The memory line is then retrieved from the main memory, stored in the cache as a cache line, and supplied to the processor.
In addition to using a cache to retrieve data from main memory, the processor may also write data into the cache instead of directly to the main memory. When the processor desires to write data to the memory, the cache makes an address tag comparison to see if the cache line into which data is to be written resides in the cache. If the cache line exists in the cache and is modified or exclusive, the data is written into the cache line in the cache memory. In many systems a data bit for the cache line is then set. The data bit indicates that data in the cache line has been modified, and thus before the cache line is deleted from the cache, the modified data must be written into main memory. If the cache line into which data is to be written does not exist in the cache memory, the cache/memory line must be fetched into the cache or the data written directly into the main memory.
Modern computer systems also use virtual addressing as a means of sharing physical memory among many different processes. In these computers, local caches use a portion of a virtual address as an index to the local cache (a virtually-indexed cache). This is often done as a performance optimization, allowing cache lookup to start before the virtual address has been converted to a physical address. Such systems may require that the underlying chip-set present a portion of the virtual address to the processor for certain bus transactions. This is because a computing system may allow more than one virtual address to map to the same physical address (a concept called aliasing). In systems with virtually indexed caches, there is often the requirements that all virtual references to the same line must map to the same set.
Other computer systems have buses that only support physical addresses. However, a processor using a virtual address cannot be placed on a physical only bus. Thus some mechanism must be provided to allow translation from a virtual bus to a physical bus.
An intermediary inclusive cache (IIC) translates between some number of processors using virtual addressing and a physically addressed bus. Inclusive intermediary caches are well known and have been produced by companies such as NVS, and Intel (in the form of a dataless coherency filter).
Inclusive intermediary caches behave as another level in the cache hierarchy. The IICs support at least one virtual bus (upper bus) connecting the IICs to central processor units (CPUs), and at least one physical bus (lower bus) connecting the IICs to a memory controller, input/output (I/O) devices and perhaps other IICs. Whenever a CPU makes a request of memory (on the upper bus), the request is looked up in the IIC. Should the data reside in the IIC, the data is provided to the CPU from the IIC through the upper bus (except in the case of coherency filters which do not cache data). If the request misses the IIC, the request is repeated on the lower bus. When the requested data comes back from the lower bus, the data is cached in the IIC and passed up to the requesting CPU through the upper bus. Whenever a snoop request comes in from the lower bus, the snooped (requested) data is looked up in the IIC. Should the snoop miss the IIC, that is the requested data is not in the IIC, the request need not be repeated on the upper bus. In the case of the snoop hit on the IIC, the snoop may be repeated on the upper bus if a coherency protocol requires. In the case of a snoop where the IIC or a CPU on that IIC""s upper bus holds the data in a modified state, or in the case of an IIC capacity fault, a data line may be evicted from the IIC. In these cases, a back invalidate transaction may be generated on the upper bus to force an eviction of the data in order to maintain inclusion.
In an embodiment, an IIC is interposed between processors requiring virtual addresses for snoops on the virtually-addressed upper bus and a physically addressed lower bus. The IIC is responsible for maintaining a copy of each memory line""s virtual address. The virtual address of each line is recorded by the IIC and stored either with the tag for the line, the line data or in a separate array. Whenever the IIC needs to snoop or invalidate a line from the CPU(s) on the upper bus, the IIC presents the virtual address the IIC recorded when the line was first placed into the IIC. All lines in the IIC were placed there at the request of a CPU, which at the time of the request provided a virtual address. The IIC is further restricted to behave as a coherency filter and never pass on a snoop address from the physically-addressed lower bus that was not a hit in the IIC. Thus every line in the IIC has a virtual address, and there is no requirement to ever receive a virtual address from the physically-addressed lower bus.