The prior art teaches data processing systems which contain a multilevel storage hierarchy having one or more caches, in which the cache in the lowest hierarchy level L1 is directly accessible (i.e. private) to a single CPU, in order to be in close proximity to the CPU for fast access. Each cache contains lines of data having a line length (i.e. byte length) convenient to the respective cache, wherein the different caches may have different line lengths. The prior art also teaches having a second level (L2) cache which may have a line length that is a multiple of the line length in each entry in the lowest level cache (L1).
In the prior art, main frame CPUs often include an instruction unit as a source of requested addresses, a translation lookaside buffer (TLB), an L1 cache and as the lowest hierarchy level, and an L2 cache and directory as its next hierarchy level.
Cache efficiency is important to system performance. An important parameter for measuring cache efficiency is the average time duration from when a storage request address is available from the CPU instruction unit until the requested data is available to the instruction unit. This duration is usually measured in numbers of machine cycles. Cache efficiency increases as this parameter decreases.
Conventional systems may operate in the following manner: A requested storage address from the instruction unit may be real, absolute, or virtual. If virtual, the page address (containing the requested address) may have been previously translated by dynamic address translation (DAT) means in the system which put the page's real or absolute address in a TLB entry, which is now accessed in the TLB by the requested address to obtain the translated address. A TLB miss is determined if no TLB entry contains the required translation, and then the requested virtual address is translated by DAT, which puts the translation into the TLB, from which it may be later accessed. Thereafter the requested virtual address only requires a TLB lookup and compare to obtain the corresponding translated real/virtual address from the TLB, until it is later replaced in the entry after a period of nonuse.
The DAT translates a virtual address to a real address, which is put into the TLB in a uniprocessor.
But if the CPU is in a multiprocessor, a prefix address is added to the translated real address to make it into an absolute address, and the virtual request's absolute address is then put into the TLB.
If the CPU requests a real address, no translation is done, but if the CPU is in a multiprocessor a prefix address is added to the requested real address to make it into an absolute address.
CPU requested real addresses have been handled in different ways by prior CPUs; some have put the real/absolute address in the TLB in the same manner as is done with virtual addresses, while others have used a bypass path around the TLB to the L1 cache for an access attempt in the cache, in order to avoid using TLB space for an address not requiring translation.
The DAT operation in the IBM System/370 architecture uses a segment table descriptor (STD), comprised of a segment table origin (STO) and a segment table length (STL).
In systems using multiple address spaces, a STO is part of each requested virtual address for identifying the virtual address space containing the requested virtual address. STOs (or STO identifiers) have previously been put in each TLB entry as part of the virtual address. The STO in the accessed TLB entry must be compared with the STO provided with each requested virtual address in finding any TLB address translation. Thereafter only the translated address is used in accessing the requested data in the cache, and in main storage when needed. Some prior systems uses a STO identifier table to contain all recently used STOs and corresponding assigned STO identifiers that have fewer bits than the STO; and the STO identifier is put in the TLB instead of the STO to allow a smaller size TLB circuit array, since smaller arrays allow faster access.
In the conventional cache directory, a set associative arrangement was provided, in which a row in the cache directory (called a "congruence class") was selected by each address provided by the instruction unit (whether real/absolute, or virtual). And each row comprised a set of entries (called bins or bin identifier) which were handled associatively, i.e. each congruence class was set associative. In this manner the directory row selection was being made before TLB address translation was completed, in order to obtain selection of a cache congruence class before the TLB translated address was available, which speeded up operation on the critical cache path in the CPU.
In the conventional cache, only translated addresses are put into the cache directory. That is, a real/absolute address representation is provided in each used cache directory entry. This real address was read out of each directory entry in the congruence class selected by each instruction unit requested address. The set of directory readout real addresses arrived at respective comparator circuits at about the same time that the TLB translated address arrived at these circuits, and a simultaneous comparison was made to find which, if any, of the plural addresses from the selected congruence class matched the translated requested address, i.e. this is the set associative comparison for the cache.
This prior operation resulted in requiring a TLB hit before a L1 cache hit could be obtained. If a TLB miss occurred, the L1 cache determination had to wait until the TLB miss operation was completed by a DAT operation, with the L1 cache operation being restarted after the DAT operation for the current CPU request had put the new translation into the TLB. A TLB miss required a dynamic address translation (DAT), that may require two accesses of translation tables in main storage, which is relatively slow.
It is noted that known commercially used L1 cache directories do not contain virtual addresses. Their cache addresses are real/absolute addresses so they can be compared with TLB outputted real/absolute addresses. Virtual address values cannot be compared with real/absolute address values, since a virtual address may be translated into any real page address available in main storage.
Accordingly, the conventional L1 cache directory requires two serially occurring compare operations before a corresponding L1 directory address can be found to exist or not exist, i.e. L1 cache hit or miss. If an L1 hit occurs, the data (usually a double-word) is accessed in the L1 cache and it is sent to the CPU.
U.S. Pat. No. 4,495,575 has a single buffer corresponding to an L1 cache, which is not a private CPU cache because it is accessed by I/O channels as well as a CPU. Its cache directory entries each have "sum data" comprised of a space ID and a block address which are compared to a space ID and a block address in the virtual address in a register 46 received from the CPU or channel. Upon a buffer miss, an address conversion table 61 supplies a real address to MM 22 to obtain the data.
In all prior cache systems, a L1 cache miss requires an access of the requested data from the next higher level in the storage hierarchy, which commonly has been main storage in large systems.
If a L2 level cache exists in the system, L2 is accessed instead of main storage to provide the requested data to both L1 and the CPU if L2 contains the data. If the L2 cache does not contain the requested data, main storage is accessed for it, with the access time for determin.ing the L2 cache miss being added to the overall access time for the requested data. A real/absolute address is conventionally used to access the L2 cache directory, which requires the output of the TLB when a virtual address is being requested by the CPU.
In all prior caches, the occurrence of a TLB miss may occur independently of a L1 cache directory miss. Fortunately most CPU requests (over 90%) hit in both the TLB and cache, which is the reason for the existence of the TLB and caches.
A basic requirement of L2 caches is that they must have a large size to be effective, such as several times larger than the L1 cache. Hence L2 has the likelihood of containing data from many more pages in main storage than does L1. However a fundamental problem may exist in that the TLB is not usually large enough to contain all the page translations representing the data existing in L2. The result is that even though a requested line of data may exist in the L2 cache, its TLB entry may have been replaced before the current request is made, so that a TLB miss results, and its related DAT operation must be completed for the TLB in such prior systems before the L2 cache can be accessed to obtain data already there.
In prior U.S. Pat. No. 4,464,712, the page-translating TLB entries correspond to page-size lines in the L2 cache, which has a L2 cache directory separate from the TLB (i.e. DLAT). Absolute-addresses outputted by the TLB upon each TLB entry replacement operation locate and control the settings of replacement-candidate flag bits R in the L2 entries to control the LRU replacement selection of line entries in the L2 cache directory. This requires a TLB/L2 relationship in which the L2 cache has an L2 line size equal to the TLB controlled page size (e.g. 4096 bytes).