The present invention relates to processors. More particularly, the present invention relates to memory cache bank prediction in a processor.
To more efficiently access data, many processors move information from a main memory, which can be slow to access, to a memory xe2x80x9ccachexe2x80x9d which allows for faster access. In addition, modern processors schedule instructions xe2x80x9cout of orderxe2x80x9d and execute multiple instructions per cycle to achieve high performance. Some instructions, however, need to access information stored in the memory cache. For example, about one-third of all micro-instructions executed may be load/store instructions which access the memory cache. In order to achieve high performance by executing multiple instructions in a single cycle, the system should therefore permit more than one concurrent memory cache access in a cycle.
There are several ways to accomplish this goal. A truly xe2x80x9cmulti-portedxe2x80x9d cache, i.e. one that supports multiple simultaneous accesses, fulfills this goal, but this is a complex solution that can be costly to implement in terms of area, power and speed.
Another known solution is a xe2x80x9cmulti-bankedxe2x80x9d cache. In this scheme, the memory cache is split into several independently addressed banks, and each bank supports one access per cycle. FIG. 1 illustrates a system having such a multi-bank memory cache. The system includes a first memory cache bank 240 and a second memory cache bank 340. A scheduling unit 100 schedules instructions to one of two pipelines. For example, the scheduling unit 100 may schedule an instruction to a first pipeline such that the instruction is processed by an Address Generation Unit (AGU) 210 and ultimately by a cache access unit 230. The scheduling unit 100 may instead schedule an instruction to a second pipeline such that the instruction is processed by another AGU 310 and ultimately by another cache access unit 330.
This is a sub-ideal implementation because only accesses to different banks are possible concurrently. This is done in order to reduce the complexity and cost of the cache, while still allowing more than one memory access in a single cycle. As a result, an instruction being processed by the first pipeline that needs to access information in the second memory cache bank 340 may not be able to execute.
To solve that problem, each instruction pipeline may use a xe2x80x9ccross-barxe2x80x9d to access information in the other memory cache bank. For example, a set up latency 220, 320 may be incurred while the pipeline accesses information in the other memory cache bank. This delay, however, slows the operation of the pipeline.
If the memory cache bank associated with each load instruction was known, the processor could schedule load instructions in such a way so as to maximize the utilization of the banks 240, 340 and approximate true multi-porting. However, in current processors this is not done because the scheduling precedes the bank determination.
In accordance with an embodiment of the present invention, a memory cache bank prediction unit is provided for use in a processor having a plurality of memory cache banks. The memory cache bank prediction unit has an input port that receives an instruction. The memory cache bank prediction unit also has an evaluation unit, coupled to the input port, that predicts which of the plurality of memory cache banks is associated with the instruction.