In any data processing system, it is known that the speed of a processor (CPU) is much faster than its memory. Therefore, in order to allow a CPU to access data instantly and smoothly as possible, the storage of a CPU is usually organized with a hierarchy of heterogeneous devices: multiple levels of caches, main memory, drums, random access buffered DASD (direct access storage devices) and regular DASDs. Logically, any memory access from the CPU has to search down the hierarchy until the data needed is found at one level, then the data must be loaded into all upper levels. This feeding of data to the CPU, on a demand basis, is the simplest and most basic way of implementing a memory hierarchy.
U.S. Pat. No. 3,670,309 to G. M. Amdahl et al, sets forth a Storage Control System in a data processing system with two levels of storage, a cache and a main memory. This allows the CPU in the system to simultaneously interrogate the directories of both storages.
U.S. Pat. No. 4,290,103 to A. Hattori sets forth a management technique of caches in a multiprocessor system. Each CPU in the system has a local cache which shares a second level (L2) cache and main memory. Flags or Tags are used in the L2 cache to indicated the cache lines in L2 blocks. The purpose is to avoid the unnecessary invalidation of cache lines when a L2 block containing them is to be replaced from L2.
U.S. Pat. No. 3,670,307 to Arnold et al sets forth a system using two-level storage. A tag storage for the high-speed storage and a directory storage for the main storage are used. Desired data is retrieved from main storage and placed in high-speed storage. The tag indexing is updated. The tags contain a bit indicating that the corresponding address in high-speed storage has been fetched.
U.S. Pat. No. 4,344,130 to Fung et al discloses a block move instruction to execute a data transfer function in a microprocessor.
U.S. Pat No. 3,839,704 to Spencer discloses a computing system which utilizes a limited number of blocks of data in a backing store, which is part of a smaller buffer store. A dummy request is produced to transfer the block to the buffer store.
Most of the patents described above deal with concurrent access to multiple levels of storage from the CPU (s) in the system, or are managing the consistence of data movement between different levels of storage. Few are concerned with pre-loading of data between levels of storage in a memory hierarchy prior to the CPU(s) actually needing the data.
U.S. Pat. No. 3,292,153 to R. S. Barton et al sets forth a memory system which includes a low-speed memory and a high-speed cache. The memory system has an advance scan or `program lookahead` feature that prefetches the next few instructions from the memory to the cache when the CPU is busy executing an instruction and the memory system is not busy.
U.S. Pat. No. 3,898,624 to R. J. Tobias sets forth an algorithm in a data processing system to prefetch the next sequential line from the main memory to a high-speed cache and to replace an existing line in the cache during prefetching.
According to the present invention, a prefetching mechanism for a memory hierarchy which includes at least two levels of storage is set forth. Let L1 and L2 be any two consecutive levels of storage, with L1 being a high-speed low capacity memory, and L2 being a low-speed high-capacity memory, with the units of L2 and L1 being blocks and sub-blocks respectively, with each block containing several sub-blocks in consecutive addresses. The mechanism is based on the fact the residence time of a block in L2 is in general much longer than that of any of its sub blocks in L1, and the CPU always has a tendency of referencing groups of data in close proximity with respect to one another, which in general is known as high spatial locality. During the residence time of a L2 block, the time should be long enough to determine how its sub-blocks are repeatedly fetched to L1 and then replaced. The present invention teaches a very simple but effective way to record the history of sub-block usage in the above sense, such that sub-blocks can be prefetched to L1 before they are actually needed by the CPU.