The use of one or more cache memory systems within a computer's memory hierarchy is a well-known technique to increase the performance of a computer (see e.g., Handy, Jim; The Cache Memory Book; Academic Press, 1998). FIG. 1 illustrates a typical cache memory array 100. Cache memory array 100 includes cache lines 110. Each cache line includes a tag 120 and a data block 130. Example cache line 140 includes tag 150 and data block 160. Reference numeral 170 illustrates that example tag 150 is a portion of main memory address 170. Main memory address 170 is the main memory address corresponding to data block 160.
Processors transfer instructions and operands back and forth between the execution core of the processor and the computer's memory hierarchy during memory transfers. Examples of memory transfers are loading instructions/operands from the memory hierarchy to the processor and storing instructions/operands from the processor to the memory hierarchy. During a memory transfer, the processor generates a main memory address. A portion of the main memory address is compared with the entries in tag 120 during a cache look-up to determine whether cache array 100 contains an entry corresponding to the memory transfer. As demonstrated by the relationship between tag 150 and main memory address 170, the process of a cache look-up is accelerated by requiring the processor to compare only a portion of each main memory address with each entry in the tag. Typically, cache memory uses a portion of each linear address generated by the processor to index data stored in cache array 100.
A thread is a part of a computer program that can execute independently of other parts of the computer program. The performance of a processor can be enhanced if multiple threads are executed concurrently on the processor. Concurrent execution of multiple threads is possible if the dependencies among the various instructions of the multiple threads are detected and properly managed.
FIG. 2 illustrates how many Intel® processors use a virtual memory environment to allow a large linear address space to be supported by a small amount of physical memory (e.g., random access memory). During a memory transfer, a processor generates a linear address 210. Linear address 210 comprises a directory field 220, a table field 225, and an offset field 230. The base of the page directory 235 is contained in control register CR3 240. The directory entry 220 of linear address 210 provides an offset to the value contained in control register CR3 240. The page directory contains a page table base pointer 245. Table field 225 provides an offset that is combined with page table base pointer 245 to identify the base of the page that contains the physical address 255. Offset field 230 is combined with the page table entry to identify the physical address 255.
FIG. 3 illustrates a shortcoming associated with performing cache look-ups with only a partial main memory address. A processor (not shown) generates linear addresses 304 and 306 in response to load instructions LD0 and LD1. 305 and 307 illustrate the portion of each address that is used to perform a cache look-up. While 305 and 307 appear to be identical, they are only a portion of 304 and 306 respectively. 304 and 306 map to two different physical addresses because each address has a different entry in their respective directory fields (320 and 325) and offset fields (330 and 335). An additional complication is introduced when, as in FIG. 3, a processor supports concurrent execution of multiple threads. Thread 0 and Thread 1 can have different values for the bases of their respective page directories (340 and 345). Thus, even if 304 and 306 were the same, they would map to two different physical addresses.