1. Technical Field
This invention relates generally to tag entries that provide status information for locations within a cache for a memory address, and more particularly to the retrieval of such entries and determining an error-correcting code (ECC) for such entries.
2. Description of the Prior Art
There are many different types of multi-processor computer systems. A symmetric multi-processor (SMP) system includes a number of processors that share a common memory. SMP systems provide scalability. As needs dictate, additional processors can be added. SMP systems usually range from two to 32 or more processors. One processor generally boots the system and loads the SMP operating system, which brings the other processors online. Without partitioning, there is only one instance of the operating system and one instance of the application in memory. The operating system uses the processors as a pool of processing resources, all executing simultaneously, where each processor either processes data or is in an idle loop waiting to perform a task. SMP systems increase in speed whenever processes can be overlapped.
A massively parallel processor (MPP) system can use thousands or more processors. MPP systems use a different programming paradigm than the more common SMP systems. In an MPP system, each processor contains its own memory and copy of the operating system and application. Each subsystem communicates with the others through a high-speed interconnect. To use an MPP system effectively, an information-processing problem should be breakable into pieces that can be solved simultaneously. For example, in scientific environments, certain simulations and mathematical problems can be split apart and each part processed at the same time.
A non-uniform memory access (NUMA) system is a multi-processing system in which memory is separated into distinct banks. NUMA systems are similar to SMP systems. In SMP systems, however, all processors access a common memory at the same speed. By comparison, in a NUMA system, memory on the same processor board, or in the same building block, as the processor is accessed faster than memory on other processor boards, or in other building blocks. That is, local memory is accessed faster than distant shared memory. NUMA systems generally scale better to higher numbers of processors than SMP systems.
Each building block, or node, typically caches the distant shared, or remote, memory to improve memory access performance. At least because more than one node may cache the same remote memory at the same time, information regarding the caching of the remote memory is stored at each node. The information regarding the cache is known as a tag, and all the tags are stored in what is known as tag memory. There is a tag entry within the tag memory for each cache location within the cache. The tag entry may indicate, for instance, what memory location is being cached at its corresponding cache location, what other nodes are caching the memory location in their caches, and the status of the cache location.
Furthermore, the data in a cache is normally managed in fixed sized blocks, typically between 32 and 128 bytes long. With 32-byte blocks, the low five bits of the address (25=32) determine which byte within a block is desired. The remaining bits of an address are called the block address. The block address is further split into an index portion and a tag portion. The index portion, which is typically the low-order portion of the block address, determines where the block can be held in the cache The tag portion, typically the high order portion of the block address, is used to identify which block actually is stored at a given cache location. The number of bits used as the tag determines how many different memory addresses can be cached in the same location in the cache.
As a simple example, for a four-bit memory address having the three trailing bits 111, the leading bit can be either 0 or 1. If the tag is only this first leading bit, this means that for the cache location corresponding to the bits 111, either the memory address 0111 or the memory address 1111 can be stored. To ensure that using a cache improves performance, the process of determining whether the cache holds the data for the desired memory address should be performed quickly.
For performance reasons, tag memory is usually fast. This is so that memory accesses throughout the system are not unduly slowed. However, such fast memory is expensive, so it is desirably conserved as much as possible. Furthermore, activity on the tag memory bus is desirably lessened as much as possible, to also ensure optimal system performance. For these and other reasons, therefore, there is a need for the present invention.