1. Technical Field
This invention relates generally to a cache for remote or primary memory, and more particularly to determining whether a cache entry in the cache actually caches a desired memory address of the remote or primary memory.
2. Description of the Prior Art
There are many different types of multi-processor computer systems. A symmetric multi-processor (SMP) system includes a number of processors that share a common memory. SMP systems provide scalability. As needs dictate, additional processors can be added. SMP systems usually range from two to thirty-two or more processors. One processor generally boots the system and loads the SMP operating system, which brings the other processors online. Without partitioning, there is only one instance of the operating system and one instance of the application in memory. The operating system uses the processors as a pool of processing resources, all executing simultaneously, where each processor either processes data or is in an idle loop waiting to perform a task. SMP systems increase in speed whenever processes can be overlapped.
A massively parallel processor (MPP) system can use thousands or more processors. MPP systems use a different programming paradigm than the more common SMP systems. In an MPP system, each processor contains its own memory and copy of the operating system and application. Each subsystem communicates with the others through a high-speed interconnect. To use an MPP system effectively, an information-processing problem should be breakable into pieces that can be solved simultaneously. For example, in scientific environments, certain simulations and mathematical problems can be split apart and each part processed at the same time.
A non-uniform memory access (NUMA) system is a multi-processing system in which memory is separated into distinct banks. NUMA systems are similar to SMP systems. In SMP systems, however, all processors access a common memory at the same speed. By comparison, in a NUMA system, memory on the same processor board, or in the same building block, as the processor is accessed faster than memory on other processor boards, or in other building blocks. That is, local memory is accessed faster than distant shared memory. NUMA systems generally scale better to higher numbers of processors than SMP systems. The term building block is used herein in a general manner, and encompasses a separable grouping of processor(s), other hardware, such as memory, and software that can communicate with other building blocks.
Many multi-processor systems, as well as single-processor systems, employ a cache to improve performance. For instance, in a NUMA multi-processor system, each building block may have a cache to temporarily store data permanently stored on the remote shared memories of other building blocks. Types of caches include one-way, or direct-mapped, caches, in which each memory address can be cached at only a given location in the cache, as well as the more general multi- or n-way caches, in which each memory address can be cached at n different locations in the cache. When a processor wishes to access the data stored at a given memory address, the processor or another component determines whether the cache currently stores this data. If so, then there is no need to access the data at its remote or otherwise primary memory.
The data in a cache is normally managed in fixed sized blocks, typically between 32 and 128 bytes long. With 32-byte blocks, the low five bits of the address (25=32) determine which byte within a block is desired. The remaining bits of an address are called the block address. The block address is further split into an index portion and a tag portion. The index portion, which is typically the low-order portion of the block address, determines where the block can be held in the cache. The tag portion, typically the high order portion of the block address, is used to identify which block actually is stored at a given cache location. The number of bits used as the tag determines how many different memory addresses can be cached in the same location in the cache. As a simple example, for a four-bit memory address having the three trailing bits 111, the leading bit can be either 0 or 1. If the tag is only this first leading bit, this means that for the cache location corresponding to the bits 111, either the memory address 0111 or the memory address 1111 can be stored. To ensure that using a cache improves performance, the process of determining whether the cache holds the data for the desired memory address should be performed quickly. One way to accomplish this is to use a fast tag lookup operation.
A fast tag lookup operation determines whether a desired tag is stored at a given location in the cache. In some systems, a cache controller passes a request for performing this operation to another component in the system, while concurrently or immediately thereafter reading the cache. This other component should perform the fast tag lookup operation and its results should be received by the controller before or at the same time the controller completes its cache read operation. In this way, the controller knows whether the cache stores the desired memory address before or at the same time the data from the cache is retrieved. If the fast tag lookup operation is not performed quickly enough, the controller will have already retrieved the cache entry for the memory address, and will have to wait to learn whether the cache entry actually caches the memory address.
To ensure that the fast tag lookup operation is performed fast enough, the memory that the fast tag lookup operation uses must be sufficiently fast, typically faster than the memory being used as the cache. However, such fast memory can be expensive. To decrease costs, system designers may limit cache size to so that the size of the memory used for the fast tag lookup can also be limited. However, decreasing cache size usually leads to performance degradation of the system. Therefore, system designers may have to choose between performance and cost in developing their systems. For these described reasons, as well as other reasons, there is a need for the present invention.