In a data processing system, instructions and associated data are transferred from memory to one or more processors for processing, and then resulting data generated by the processor is returned to memory for storage. Thus, typical processing operations involve frequent and repetitive reading and writing from memory. As a result, memory access delays are often a primary limitation in the performance of a data processing system. Preferably, therefore, memory access speed should be maximized to maximize performance. However, often cost and other constraints require that the main memory be comprised of relatively long access time circuitry. To overcome the resulting performance drawbacks, memory caches are typically used.
A memory cache typically includes a relatively small, but high speed, bank of memory, which can be more rapidly accessed by the processor(s) than the main memory. Memory locations in the main memory are duplicated in the cache. When a particular memory location being accessed by the processor is duplicated in the cache--event which is known as a cache "hit"--the processor may rapidly access the cache instead of waiting for access to main memory. The cache is managed with the goal of maximizing the fraction of accesses which are hits in the cache.
Caches are typically organized into "lines", which are relatively long sequences of memory locations found in main memory. Typically, when a memory location accessed by a processor is not duplicated in the cache--an event which is known as a cache "miss"--an entire line containing the missed memory location, and neighboring memory locations, is brought into the cache as part of retrieving the missed location from other caches or main memory--an event which is known as a "linefill" into the cache.
Typically, each cache line is associated with multiple groups of locations in the main memory. Each cache line stores duplicates of associated groups of memory locations, as well an indication of which groups of memory locations are currently stored in that line. Thus, when a processor requests access to a particular memory location, the cache line corresponding to that memory location is accessed to determine whether that cache line is storing the group of memory locations which includes the requested location. If so, the requested memory location is accessed in the cache. If not, a group of memory locations including the requested location is linefilled into the cache.
Typically, an n-way associative cache stores n of the several groups of locations corresponding to a cache line in the cache at one time. When a group of memory locations is linefilled into the cache, memory contents in the same cache line may need to be replaced. If the contents of the replaced cache line have been modified, then the line has to be stored back into the corresponding group of locations in the main memory--an event which is known as a "castback" or "writeback" from the cache.
In high performance data processing systems, often there are two or more caches, organized so that a processor attempts to access a memory location by first attempting to locate a duplicate of that location in a "level 1" or L1 cache. If there is a miss in the L1 cache, then an attempt is made to locate a duplicate of the desired memory location in a "level 2" or L2 cache. If there is a miss in the L2 cache, each lower level cache is sequentially checked in the same manner. If there is a hit in one of the caches, then the desired memory locations are obtained from that cache, and typically, the accessed memory locations are duplicated, along with neighboring locations completing a cache line, into the appropriate line of at least the L1 cache--although in some cases an access may be "cache-inhibited", in which case the data is not stored in the L1 cache after retrieval. If there are misses in all of the caches, the missed location, along with neighboring locations completing a cache line, is retrieved from main memory, and filled into one or more of the caches if the access is not cache-inhibited. Similarly, if a line is cast back from a cache, the line may be written to a higher level cache, main memory, or both.
Typically, lines of instructions and data are transferred from caches and processors to other caches and processors using buffers. For instance, in one architecture two buffers are respectively connected to a level 1 cache and a level 2 cache. These buffers are also connected to main memory, a host processor, and possibly other processors via a system bus. The buffers allow for a smooth transition of data or instructions between components having different transfer rates. Each line in a conventional cache buffer strictly handles either fill commands or write back commands, and includes memory space which can store a finite number of cache lines, e.g., four. Each cache line in a buffer is, therefore, designated as a fill cache line or a write back cache line. In a multi-way associative cache, cache buffer lines may be used for fills or writebacks, and are dynamically configured for the appropriate purpose.
In addition to the use of caches to improve memory access performance, other well known techniques have been used to improve the performance of data processing systems. One technique is to divide a processing task into independently executable sequences of instructions called threads. Using this technique in a single-processor system, when the processor, for any number of reasons, cannot continue processing or execution of a thread, the processor switches to another thread and continues processing of that thread, rather than stalling. For example, when a cache miss stalls processing of one thread, the processor may switch to other threads which are able to continue processing. By the time the processor returns to the stalled thread, the missed location may have been linefilled into the cache, and that thread can resume with minimal additional delay. Furthermore, the processor may switch threads on a timed basis even where no threads are stalled. Alternatively, in a multi-processor system, each processor may handle one or more threads executing in parallel, so that when one thread or one processor is stalled, the other threads or processors may continue. For the purposes of this application, the terms "thread" and "multithreading" will be used to refer generally to any processing system, whether comprising a single processor or multiple processors, which executes multiple threads.
The term "multithreading", when used in the context of software, is used to refer to a particular organization scheme which can be used in writing computer programs. In this software context, therefore, "multithreading" does not relate to the manner in which computer programs are physically executed. Thus, software "multithreading" is different from the kind of "multithreading" discussed in this application. The kind of multithreading to which this application relates, may be referred to as "hardware multithreading", i.e., processor configurations permitting a single processor to switch between multiple threads of instructions upon various conditions, or permitting multiple processors to process multiple threads of instructions. Thus, in this application, the terms "thread" and "multithreading" will be understood to refer to "hardware multithreading", and not methods for software organization.
While the technique of multithreading, and the use of memory caches, both enhance the performance of a data processing system, combining these techniques raises substantial complexities. In particular, as noted above, when a memory access misses in the L1 cache, the required memory contents must be accessed from a higher level cache or main memory. When the memory contents are located, they must be delivered to the requesting processor, and also filled into one of the caches. While this process appears straightforward, in a multithreading environment or other environments where there may be multiple outstanding cache misses, efficient management of this process becomes substantially more difficult.
In particular, when multiple threads are simultaneously executing, the threads may be accessing memory locations mapped to the same cache line. As a consequence, the cache line may be repeatedly replaced as the threads access the different memory locations.
Furthermore, multiple simultaneously executing threads may increase the number of sequential accesses to memory which are directed to memory locations distant from each other. Since a group of local accesses can be more rapidly processed in memory than accesses which are distant from each other, the introduction of multiple threads can increase the overall access time to memory.
Accordingly, there is a need for a cache which is managed in a manner to improve its performance, particularly in a multithreaded and/or multiprocessor environment.