One or more aspects of the present invention relate in general to the field of hierarchical cache structures, and in particular to handling a hierarchical cache structure.
A cache memory, or cache, is a high speed memory positioned between a processor and main storage, to hold recently accessed main storage data. Whenever data in storage is accessed, it is first determined whether or not the data is in the cache and, if so, it is accessed from the cache. If the data is not in the cache, the data is obtained from the main storage and the data is also stored in the cache, usually replacing other data which had been stored in the cache memory. Usually a cache hierarchy is implemented, where multiple levels of cache exist between the processor and main storage. As one gets farther away from the processor, each cache gets larger, slower and cheaper per byte. The cache closest to the processor is called first level cache, the next-closest cache is called second level cache, and the next-closest cache is called third level cache, and so on.
One processor may have multiple first level caches, such as one first level cache for data and/or operands and one first level cache for instructions. That means that the first level cache is split in a first level data cache and in a first level instruction cache. A unified second level cache may be connected to multiple first level caches where the first level caches are either for the same processor or for multiple processors in a multi-processor system. Additionally the second level cache is the superset of the first level cache, i.e. all cache-line data of the first level cache is also in the second level cache.
This first structure has the advantage that the first level instruction cache and the first level data cache take the portion of the second level cache according to the access sequence in case of request misses to the second level cache. But, the first level cache that causes less misses will see less hits in the second level cache. E.g. if the processor works on a long stream of data, it will remove most of the instruction lines from the second level cache causing the first level instruction cache to miss and get misses in the second level cache with it.
Further the second level cache may also be split in a second level data cache and in a second level instruction cache, wherein the first level instruction cache is connected to the second level instruction cache and the first level data cache is connected to the second level data cache. A unified third level cache may be connected to multiple second level caches.
This second structure has the advantage that the first level instruction cache and the first level data cache each have a fixed share and do not run into the problem of the above described first structure. But, cache-lines that contain instructions and data modified by the program play ping-pong between the second level instruction cache and the second level data cache.
In the Patent Application Publication US 2007/0156969 A1, which is hereby incorporated herein by reference in its entirety, a method of synchronizing an instruction cache and a data cache by coupling them and thereby saving unnecessary processor time used to synchronize the instruction cache and the data cache is disclosed. The disclosed method performs a direct memory access (DMA) operation in a virtualized environment to obtain a page from a memory and to store the page in a data cache. Because such DMA operations occur without help of a processor, the system needs to maintain coherency between the instruction cache and the data cache. To do this, signals on a bus, such as so-called snoop cycles, are used to invalidate cacheable pages that a direct memory access modifies. Thus in a virtual machine monitor environment, a guest operating system that issues direct memory access read operations expects to see instruction and data caches to be synchronized when the operation is completed as in a native system operation. However, when performed by emulation, such direct memory access read operations cache data into a data cache but not into an instruction cache.
In the Patent Publication U.S. Pat. No. 8,015,362 B2, which is hereby incorporated herein by reference in its entirety, a method and system for handling cache coherency for self-modifying code is disclosed. The disclosed method for handling cache coherency for self-modifying code comprises allocating a tag in a data cache for a store operation, and sending the tag with an exclusive fetch for the cache-line to coherency logic. An invalidation request is sent within a minimum amount of time to an instruction cache, preferably only if it has fetched the cache-line and has not been invalidated since, which request includes an address to be invalidated, the tag and an indicator specifying the cache-line is for a program store compare operation. The method further includes comparing the request address against stored addresses of pre-fetched instructions, and in response to a match, sending a match indicator and the tag to a load store unit, within a maximum amount of time. The match indicator is timed, relative to exclusive data return, such that the load store unit can discard pre-fetched instructions following execution of the store operation that stores a cache-line subject to an exclusive data return, and for which the match is indicated.
Both documents deal with ways to synchronize data caches and instruction caches and to keep the two first level caches coherent.