The present application claims the benefit of the earlier filing date of PCT application PCT/DE99/03730 filed Nov. 24, 1999, which claims priority from German application 19854 505.3 filed Nov. 25, 1998.
1. Field of the Invention
The present invention relates to a cache memory device having an xe2x80x9cintelligent prefetchingxe2x80x9d functionality.
The invention relates to a cache memory device according to the preamble of claim 1.
2. Discussion of the Background
Modem standard processors, such as those manufactured by Intel(copyright), which may have one or more internal and/or external cache memory devices, usually permit only the design of processor systems with up to four individual processors. These individual processors are connected via a common system bus to one or more primary storage components, which function as main memory. The individual processors are also connected to an input/output system. Such a system represents a relatively small processor system. Large processor systems, however, which are needed for large servers, for example, must contain many more than four individual processors. The large number of processors ensures that such a system will be very powerful.
To construct large servers having a large processor system, such as a processor system containing 16, 32, 64, or 128 processors, additional cache memory devices can be provided, each having connections for 4 processors. If a plurality of such additional cache memory devices, each with 4 connected processors, are coherently coupled with one another, an extremely powerful processor system is achieved.
If the individual processors already have internal and/or external cache memory devices, for example, an on-chip level 1 cache memory device and an off-chip level 2 cache memory device, the additional cache memory device would in this case be a so-called level 3 cache memory device, with which the large processor system would be achieved.
From European Patent 0762288 A2 there is known a cache memory device for one or more processors. These are processor systems provided with one or more cache memory devices and one or more primary storage components, providing information bits needed for execution of processes are made available from the one or more primary storage components by the one or more cache memory devices. The information bits are made available from the one or more primary storage components in response to the one or more processors at least in part as the results of look-ahead read cycles (prefetching) associated with limited look-ahead regions. The cache memory device (ECS) has a functionality according to which the cache memory device (ECS), during respective read cycles in response to the one or more processors against the background of look-ahead regions which are limited in each case, controls look-ahead read cycles beyond the limited look-ahead regions of the read cycles in response to the one or more processors. The information bits additionally determined in this way are made available for at least a limited time besides those information bits obtained from read cycles in response to the one or more processors.
The object of the present invention, based on processor systems of the type discussed above, is to provide technical improvements by which the power of such processor systems is further increased.
This object is achieved by providing a cache memory device having the features described below.
Accordingly, by comparison with conventional cache memory devices, the inventive cache memory device additionally has a functionality that can also be described concisely as xe2x80x9cintelligent prefetchingxe2x80x9d. If such a cache memory device is used in large processor systems, for example as a level 3 cache memory device, the power of the entire processor system can be decisively increased.
The prefetching method is indeed known for processors, but not for cache memory devices. According to the prefetching method, the processors, for the purpose of execution of processes, not only fetch currently needed information bits, which may be data and/or code information bits but, by looking ahead, they also fetch information bits that have greater probability of being needed next in an execution process. Code information bits are information bits that relate to an executive program. Data information bits are information bits with which an executive program performs its work.
Prefetch accesses are hunch or speculative accesses. They are speculative because the processors will not necessarily have to access, as the next data or codes, those read by looking ahead. The accesses are dependent on the program processing, which controls a process to be executed currently. For example, jump instructions in the process to be executed are unstable points, which may cause information bits entirely different from those which have been read by looking ahead to become suddenly important. Nevertheless, whenever the information bits read by looking ahead are used, prefetching results in a performance increase of the overall system.
Ultimately, prefetch accesses are accesses to so-called information blocks in main memory. All information bits stored in main memory are stored in the form of such information blocks. Logically associated information blocks are then stored sequentially as far as possible. Thereby it can be assumed with relatively good success that, after a needed information block, the information block stored as the next neighbor of that information block in main memory will be the next to be needed. If at least this information block is already available in advance in a cache memory device, it can be used immediately if needed, thus obviating a time-consuming read process for this information block. In this way latency times are shortened.
The look-ahead read cycles in response to a request from a processor are confined to a limited look-ahead region. The inventive cache memory device offers the possibility of actively looking ahead beyond this limited region. This means that information bits which, up to a predetermined distance, are stored as next neighbors of the information bits fetched in response to the processor system, are made available as additional information bits. They comprise at least those information bits that are stored in working memory as immediate next neighbors of the information bits made available in response to a processor request, although they can also comprise information bits that are stored further therefrom. By application of appropriate strategies, those information bits can be stored not merely at sequentially ordered sites, but also at distributed sites. In this way even more information bits read by looking ahead are available to the processors, thus increasing the hit rate of read accesses of the processors. At the same time, the load on the processors is not further increased, since the execution of additional read cycles controls the cache memory device in a corresponding manner.
According to another aspect of the present invention, the inventive cache memory device, before making new information bits available, checks whether the information bits are already available in the cache memory device.
To prevent information bits additionally made available by the cache memory device from overwriting information bits already made available in the cache memory device and possibly needed with higher probability, the cache memory device is provided with its own special storage component for the information bits additionally made available. In this special storage component, the information bits additionally made available are temporarily stored optionally for a limited or unlimited time, until they are overwritten by new information bits additionally made available, or are read out for use in response to a processor request that has since been received. In the case of a time limitation, they can also be kept in temporary storage until either they are completely transferred into the cache memory device or are deleted once again.
On the whole, it is ensured with the inventive cache memory device that access times to working memory are greatly reduced for accesses to data and/or code information.