The present invention relates to cache memories in the area of computer technology. It relates in particular to a method and respective system for accessing a cache memory in a computer system, wherein the cache memory is accessed by a plurality of competing store and/or fetch requests via a number of commonly used input address registers, and wherein those store/fetch requests are processed by a processing pipe.
In such systems a cache segment is a part of a cache memory, which is separately addressable. This allows as many cache accesses to be active at a given cycle as there are segments available.
It is useful for most of computer applications to run in a fast processing computer system. The velocity of the processing of a computer system depends on different technical prerequisites, where clock frequency is an important one. Another crucial prerequisite is the structure of the storage hierarchy, which includes fast cache memories and the way data stored in the cache memories are accessed.
In current high performance computer systems, cache accesses are executed under the control of a processing pipe. The pipe is a sequence of processing steps, one per clock cycle, strung together one after another. In each step, in the following called cycle, certain operations are performed e.g. writing data into the cache memory (store) or reading data from the cache memory (fetch).
An interleave organization is a particularly efficient way of cache segmentation. That means a partitioning of the cache memory in columns orthogonal to the cache's line structure. Thus, each cache line touches all interleaves. Each interleave is separately addressable. A request, which is passing through a processing pipe, starts at one interleave and proceeds to the next interleave the next cycle until all data has been processed. For example a line fetch requests starts with the interleave holding the line segment, which is needed first, and proceeds to the next interleave until the complete line is read. If the cache's line has 128 bytes and the data to be transferred to and from the cache in segments of 16 bytes, then each interleave would store 16 bytes of a cache line beginning with interleave 0 and line-segment 0. With 8 interleaves, each cache line would cover all interleaves once, with only 4 interleaves twice.
The predominant characteristic of a pipelined cache access is that each cache access takes place in a fixed pipe cycle and that each request entering that pipe cycle necessarily performs its cache access. This cache access scheme requires that cache usage is recorded in a table where each cell in the table represents a certain interleave at a certain cycle. In the following, this table is referred to as “interleave model”. An example is shown in FIG. 1.
Each cache request passing through the processing pipe must have checked the interleave model before it is allowed to proceed to the actual cache access cycle. In case of an interleave conflict, the request is either rejected from the pipe or the pipe must stall until the request passes the interleave check. When the request passes the interleave check, it must reserve the interleaves for the cycles it will use, by putting corresponding reservation information into the interleave model (see FIG. 1). The update of the interleave model must be done early in the pipe cycle succeeding the checking cycle such that subsequent requests “see” the new state of the interleave model when they check the interleave model.
Access to the pipe facility is serialized via an arbitration scheme, which selects one request at a time. Arbitration happens in the first pipe cycle C0, which is therefore called priority cycle. For performance reasons, fetch requests typically have higher priority than store requests. Because stores to the cache first have to read data from the store buffer before actually writing them to the cache, stores typically occur in a later pipe cycle then fetches. The requests of lower priority, typically stores, access the cache in a later pipe cycle then the higher priority requests, typically fetches.
Current implementations of the environment described above check the interleave model within the pipe. The L2-cache pipeline of the IBM zSeries 900 for example does the checking at the end of the first pipe cycle, the priority cycle.
As a consequence, each time a request has highest priority, and its interleaves at the end of the priority cycle turn out to be unavailable, access to the second pipe cycle C1 will be denied. This results in pipe bubbles and hence in a non-optimized pipe usage. In particular, the low priority store requests are exposed to be delayed unnecessarily. This may cause the store buffers to fill up, which in turn has a severe impact on system performance.
Therefore, it would be desirable to check the interleave model up front before a request competes for pipe access. This was the approach in the first prior art IBM 9021 systems. There, requests first checked the interleave model, and only those requests which found all the interleaves they needed available were allowed to further compete for pipe access. This arbitration scheme, however, created another severe performance problem:
Since store requests, as explained before, access the cache in a later pipe cycle than fetches, a sequence of stores to one and the same cache interleave could block higher priority fetch requests for quite a while. Assume, for example, a tight programloop like a repeated counter update generates a sequence of stores to one and the same interleave, say i, one per cycle. Let d be the difference between the cache access cycles of a store and a fetch: d=b−a, where a fetch accesses the cache in pipe cycle a, and a store in pipe cycle b. In the example illustrated in FIG. 2 through 4, cycle a is C4, cycle b is C2, and d=2. The detailed description thereof is given later below.
When a fetch request targeting interleave i-d as its starting interleave hits into such a sequence, it will find the interleave i with the store, which came in d−k cycles ahead, where 0<k<d. Hence, a bunch of fetches, though they have higher priority, may have to wait until the store sequence completes, which has an immediate impact on system performance.
It is thus an object of the present invention to increase the performance of accessing a cache memory in a computer system, wherein the cache memory is split up in at least two segments and wherein the cache memory is accessed by a plurality of (competing) input address registers and cache memory requests that are processed by a processing pipe.
This objective of the invention is achieved by the features stated in the enclosed independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective subclaims. Reference should now be made to the appended claims.
The present invention discloses a method and respective system for accessing a cache memory in a computer system, wherein the cache memory is split up in at least two segments, wherein the cache memory is accessed by a plurality of competing cache memory requests (fetch, store, etc.) via a number of commonly used input address registers, wherein a cache segment model is utilized for reflecting the cache use by said competing requests, wherein cache memory requests are processed by a processing pipe and wherein each cache-request, before entering the processing pipe, is checked whether the segments of the cache memory are available at the cycle it needs. Said method and system further comprises the following steps:
a) marking a segment model cell as busy with storing, if a store-request targeting to a cache segment corresponding to said model cell has received pipe access;
b) blocking off from pipe access a fetch-request targeting to a segment model cell, which is marked busy with a store operation; and
c) blocking off any store-request from pipe access, if at least one fetch-request, which was blocked off from pipe access according to step b), is waiting for pipe access.
On the one hand this method enables stores, which are competing with fetches, to get cache access but on the other hand a fast cache access of a fetch, which was once rejected, is achieved. The benefit of this pre-pipe interleave model checking in comparison to the prior art in-pipe checking raises up to more than 10 percent when the store rate is large enough.
Further advantageous arrangements and embodiments of the invention are set forth in the respective dependent claims. Reference should now be made to the appended claims.
Advantageously, the present invention uses an interleave organization of the cache segmentation. Ideally, that is when request addresses are suitable arranged, a cache structured in interleaves provides for a cache access every new cycle of the processing-pipe. Thus the use of this way of segmentation nicely matches the concept of a processing-pipe and therefore shows excellent performance.
A further advantageous feature of the invention comprises the step of marking a segment model cell as busy with storing, by setting a store-busy-bit (SB). It is used to inform fetch requests whether a cache conflict was caused by a store or by another fetch request.
Another advantageous feature according to the invention comprises the step of marking a fetch-request, which is rejected according to step b) of the method as described above, by setting a store-reject-bit (SR). it is used to check whether any of the fetch requests waiting for cache access had been rejected before because of a conflicting store already being in progress.
It is also a preferred feature of the invention to set a store blocking-bit (BS) for blocking off a store-request from pipe access as soon as at least one featch-request, which was blocked off according to step b) of the method as described above, is waiting for pipe access. It is used to interrupt store sequences as soon as a higher priority fetch request has been blocked off from pipe access by a preceding store.