1. Field of the Invention
The present application relates generally to an improved data processing apparatus and method and more specifically to an apparatus and method for testing real page number bits in a cache directory.
2. Background of the Invention
Conventional computer systems may have one or more processing units, which are connected to various peripheral devices, including input/output (I/O) devices (such as a display monitor, keyboard, or permanent storage device), memory devices (such as random-access memory or RAM) that are used by the processing units to carry out program instructions, and firmware whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever the computer is first turned on. The processing units communicate with the peripheral devices by various means, including a generalized interconnect or bus. A conventional computer system may also have many additional components, such as serial and parallel ports for connection to, e.g., modems or printers. Those skilled in the art will further appreciate that there are other components that might be used in conventional computing systems; for example, a display adapter might be used to control a video display monitor, a memory controller may be used to access the memory, etc. Instead of connecting input/output (I/O) devices directly to the bus, the I/O devices may be connected to a secondary (I/O) bus which is further connected to an I/O bridge to the bus. The computer also may have more than two processing units.
In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical; that is, they all use a common set or subset of instructions and protocols to operate and generally have the same architecture. Such a SMP computer may include a processing unit that includes a processor core having a plurality of registers and execution units, which carry out program instructions in order to operate the computer. The processing unit also can have one or more caches, such as an instruction cache and a data cache, which are implemented using high-speed memory devices. Instructions and data may be directed to the respective cache by examining a signal that is indicative of whether the processing unit is requesting an operation whose operand is instruction versus data. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from the memory. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip. Each cache is generally associated with a cache controller that manages the transfer of data between the processor core and the cache memory.
A processing unit may also include additional caches, such as a second level (L2) cache that supports the on-board first level caches. In other words, the L2 cache acts as an intermediary between the memory and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. For example, an L2 cache may be a chip having a storage capacity of 256 or 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. The L2 cache is generally connected to a bus, and all loading of information from the memory into the processor core must come through the L2 cache. Additionally, computing systems may include multi-level cache hierarchies where there are many levels of serially connected caches.
A cache has many “blocks” which individually store the various instructions and data values. The blocks in any cache are divided into groups of blocks called “sets.” A set is a collection of cache blocks that a given memory block may reside in. For any given memory block, there is a unique set in the cache that the block can be mapped into, according to preset mapping functions. The number of blocks in a set is referred to as the associativity of the cache, e.g., 2-way set associative means that, for any given memory block there are two blocks in the cache that the memory block can be mapped into; however, several different blocks in main memory can be mapped to any given set. A 1-way set associative cache is direct mapped; that is, there is only one cache block that can contain a particular memory block. A cache is said to be fully associative if a memory block can occupy any cache block, i.e., there is one set, and the address tag is the full address of the memory block.
An exemplary cache line (block) includes an address-tag field, a state-bit field, an inclusivity-bit field, and a value field for storing the actual instruction or data. The state-bit field and inclusivity-bit field are used to maintain cache coherency in a multiprocessor computer system. The address tag is a subset of the full address of the corresponding memory block. A compare match of an incoming effective address with one of the tags within the address-tag field indicates a cache “hit.” The collection of all of the address tags in a cache (and sometimes the state-bit and inclusivity-bit fields) is referred to as a directory, and the collection of all of the value fields is the cache entry array.
When all of the blocks in a set for a given cache are full and that cache receives a request, whether a “read” or “write,” to a memory location that maps into the full set, the cache must “evict” one of the blocks currently in the set. The cache chooses a block by one of a number of means known to those skilled in the art (least recently used (LRU), random, pseudo-LRU, etc.) to be evicted. If the data in the chosen block is modified, that data is written to the next lowest level in the memory hierarchy which may be another cache (in the case of the L1 or on-board cache) or main memory (in the case of an L2 cache). By the principle of inclusion, the lower level of the hierarchy will already have a block available to hold the written modified data. However, if the data in the chosen block is not modified, the block is simply abandoned and not written to the next lowest level in the hierarchy. This process of removing a block from one level of the hierarchy is known as an “eviction.” At the end of this process, the cache no longer holds a copy of the evicted block.
Some procedures (programs) running on a processor have the unintended effect of repeatedly using a limited number of sets (congruence classes) such that the cache is less efficient. In other words, when a procedure causes a large number of evictions in a small number of congruence class members while not using a large number of other members, there are increased memory latency delays. This effect, referred to as a stride, is related to the congruence mapping function and the manner in which the particular procedure is allocating memory blocks in the main memory device.
Generally, testing of caches in a data processing system requires the allocation of twice the amount of L2 cache. By using this amount of L2 cache, some caches may require rolling or replacement when pages that index to a single index causes existing entries to be removed or evicted and, thus, cause more stress and testing of the cache lines. While the amount of memory may be increased to accomplish the same stress of the caches, the addition of memory is not optimum. To test the real page numbers (RPN) bits in the cache directory, test programs typically go through a large amount of memory. In manufacturing and also in simulation tests, time is a critical factor. The total amount of testing time available in the card manufacturing, module manufacturing, and wafer manufacturing typically is in seconds. However, simulation also obviously has the constraint of time to simulate in cycle accurate models.