This invention relates generally to a method for least recently used (LRU) compartment, capture in a multiprocessor system, and more particularly to providing a method, system and computer program product for LRU compartment capture which reduces the number of latches necessary by performing a two-pipe pass pipeline operation.
The performance of a multiprocessor computer system having a pipelined associative cache subsystem is driven by the cache access time for processor requested data fetch operations. Conventionally, a fetch hit is when a processor requested data fetch finds the data it needs in the cache, thereby saving the time required to access the main memory. A fetch miss is when the processor requested data fetch finds the data it needs is not in the cache thereby requiring an additional delay to access the main memory or another cache. Thus, the cache access time for a fetch hit is related to the time required to pull the requested data out of the cache and return it to the requesting processor, and the cache access time for a fetch miss is related to the time required to pull the data from the main memory or another cache and return it to the requesting processor. It may also be necessary in the case of a fetch miss for a cast-out operation of existing data in the cache in order to create space for the data pulled from the main memory. A directory LRU array is a conventional method for a cache to retain frequently referenced data by having each fetch which hits, update its located directory compartment in the directory LRU array as “Most Recently Used” (MRU). In the pipeline, it is advantageous to perform an address directory lookup operation and the cache access as early as possible and to place the directory LRU array at a cycle after which time the address directory lookup result is available. Further, regarding the fetch miss operation, the cast-out operation is not a performance critical operation of the fetch miss operation, such that time as measured in the number of cycles during the pipeline process is not limited as long as the cast-out operation is completed prior to the returning of the requested data to the requesting processor. In a cache subsystem, especially one that is servicing concurrently multiple fetches from a multiple number of processors, it is desirable to pipeline the fetch operations in a manner that minimizes contention for the directory and cache resources. Furthermore, it is desirable to minimize the cache access time for each of the data fetch operations.
Thus, currently, a single pipe pass pipeline is performed where all of the operations associated with a fetch hit are performed in a single pipe pass. FIG. 1 illustrates cycles performed in a conventional single pipe pass pipeline of a multiprocessor system, and FIG. 2 illustrates a conventional single pipe pass pipeline method including fetch hit and miss operations for a cache in a multiprocessor system. As shown in FIGS. 1 and 2, at operation 100, a processor sends in a fetch request with a storage address of desired data into the pipeline and the processor data fetch request is received. From operation 100, the process moves to operation 110, where a first cycle C1 is performed where an address directory lookup is initiated to a congruence class of a line address targeted by the fetch request. The address tag and ownership state information of each entry associated with each compartment in that congruence class are read. From operation 110, the process moves to operation 115, where a second cycle C2 is performed, where the address tag and ownership state information read is used to determine if a fetch hit has occurred in the cache. That is, the processor fetch address is compared against each compartment entry value, and simultaneously compartment data in the cache for the specified congruence class is read in case a fetch hit occurs. From operation 115, the process moves to operation 120, where it is determined whether a fetch (i.e., directory hit) has occurred. If it is determined that a fetch hit has occurred the process moves to operations 125 and 130. If not, the process moves to operation 140. In operations 125 and 130, respectively, a third cycle C3 is performed where the directory LRU array is updated to indicate a MRU compartment based on the fetch hit results (see operation 125) and directory hit is indicated as a cache late select (see operation 130). The process then moves to operation 135 where in a fourth cycle C4, the data is returned to the requesting processor. On the other hand, if a fetch miss occurs at operation 120, the process move to operation 140, where in the third cycle C3, the directory LRU array is accessed to determined the LRU compartment, each directory compartment entry is staged down as one of the entries will have to be selected for the cast-out operation to make room for the data coming into the cache, and the cache access is cancelled. From operation 140, the process moves the operation 145, where in the fourth cycle C4, the LRU compartment is determined and used to select a staged directory address compartment to be output as the LRU address. From operation 145, the process moves to operation 150, where in a fifth cycle C5, the LRU address is loaded into a cast-out controller to proceed with an LRU cast-out operation, to thereby cast out the LRU data out of the cache.
In the conventional method described above, each entry of the directory is required to be staged down for three pipeline cycles (C2, C3 and C4) before a determination can be made from the LRU array as to which entry is the LRU entry and the cast-out controller can be loaded with the LRU entry. A cache directory utilizes a plurality of clock-controlled latches for timing of the cast-out operation. These latches represent latency boundaries between stages or cycles in the pipelined structure of the cache directory. In the conventional method, a large number of latches are required, where the total number of latches is equal to the address tag size multiplied by the number of compartments and the number of cycles to be performed.
It would be desirable to be able to efficiently perform a cast-out operation of data from the cache directory while reducing the number of latches necessary to perform the cast-out operation.