The present invention relates to microprocessor design and more particularly to microprocessors with memory attached accelerators.
A so-called memory attached accelerator typically comprises a co-processor that is added to a processor core of a microprocessor in order to perform special tasks.
Prior art machines have a micro-architecture with the co-processor integrated into the processor core and running at core frequency, which is significantly lower than in up-to-date machines actually developed. Therefore within prior art machines it is possible to share the processor cores Instruction-cache (I-cache) and Instruction-Translation Lookaside Buffer (I-TLB) with the co-processor for dictionary fetches with only small impact on throughput and latency.
An actual processor core having a co-processor integrated in the core processor is e.g. the IBM eServer z990 microprocessor, known e.g. from Slegel et Al: ‘The IBM eServer z990 microprocessor’; IBM J. Res. & Dev. Vol. 48; No. 3/4; May/Jul. 2004; pp 295-309, or from Rayns et Al.: ‘IBM eServer zSeries (z990) Cryptography Implementation’; IBM Redbooks; 2004; ISBN 0738490369.
Since recent processor cores in actual machines run at significantly higher frequency than previous machines, in the actual development co-processors will no longer be integrated into the processor core but are treated as separate units within the micro-architecture running slower, e.g. at half the frequency of the processor cores. Thus microprocessors actually developed have a co-processor for data compression and cryptography assigned, which is physically located on the processor chip, but outside the individual processor cores. Such a co-processor needs to fetch dictionary entries by means of virtual storage references.
Thus a memory attached accelerator is under development having a micro architecture with at least one co-processor separated from at least one core processor. The co-processor directly uses the instructions of the core processor and directly accesses a main storage by virtual addresses of the core processor. Said co-processor comprises a Translation Lookaside Buffer (TLB), in order to use virtual addresses of the core processor to directly access said main storage.
In previous machines, where the co-processor was still integrated into the processor core, the dictionary accesses could be performed via the I-cache and I-TLB. In contrast in an up-to-date processor core like e.g. in the IBM eServer z990 microprocessor this can cause excessive access latencies. Thus the co-processor of such an up-to-date processor core has a dedicated memory storage like e.g. a dedicated cache infrastructure. This includes also the dedicated TLB mentioned above for the virtual to absolute address translations, since the co-processor accesses are virtually.
Thereby the following problem arises. Since such TLB are made of preferably four compartments or zones that can be assigned in a flexible manner, more than one at a time, e.g. two compartments or zones can or are to be replaced at a same time. This implies to adapt accordingly the least recently used (LRU) algorithm, according to which always the, i.e. a single, eldest cache entry is replaced by the, i.e. a single, youngest entry.
In other words, common LRU algorithms are based on the rule to replace the oldest usually least recently or least frequently used entry first. They replace exactly this entry exclusive during regular updates.
When replacing more than one entry at a time, to apply the common LRU algorithm will not be so effective.