To more efficiently access data, many processors move information from a main memory, which can be slow to access, to a memory “cache” which allows for faster access. In addition, modem processors schedule instructions “out of order” and execute multiple instructions per cycle to achieve high performance. Some instructions, however, need to access information stored in the memory cache. For example, about one-third of all micro-instructions executed may be load/store instructions which access the memory cache. In order to achieve high performance by executing multiple instructions in a single cycle, the system should therefore permit more than one concurrent memory cache access in a cycle.
There are several ways to accomplish this goal. A truly “multi-ported” cache, i.e. one that supports multiple simultaneous accesses, fulfills this goal, but this is a complex solution that can be costly to implement in terms of area, power and speed.
Another known solution is a “multi-banked” cache. In this scheme, the memory cache is split into several independently addressed banks, and each bank supports one access per cycle. FIG. 1 illustrates a system having such a multi-bank memory cache. The system includes a first memory cache bank and a second memory cache bank 340. A scheduling unit 100 schedules instructions to one of two pipelines. For example, the scheduling unit 100 may schedule an instruction to a first pipeline such that the instruction is processed by an Address Generation Unit (AGU) 210 and ultimately by a cache access unit 230. The scheduling unit 100 may instead schedule an instruction to a second pipeline such that the instruction is processed by another AGU 310 and ultimately by another cache access unit 330.
This is a sub-ideal implementation because only accesses to different banks are possible concurrently. This is done in order to reduce the complexity and cost of the cache, while still allowing more than one memory access in a single cycle. As a result, an instruction being processed by the first pipeline that needs to access information in the second memory cache bank 340 may not be able to execute.
To solve that problem, each instruction pipeline may use a “cross-bar” to access information in the other memory cache bank. For example, a set up latency 220, 320 may be incurred while the pipeline accesses information in the other memory cache bank. This delay, however, slows the operation of the pipeline.
If the memory cache bank associated with each load instruction was known, the processor could schedule load instructions in such a way so as to maximize the utilization of the banks 240, 340 and approximate true multi-porting. However, in current processors this is not done because the scheduling precedes the bank determination.