The problem of accessing instructions that are stored in a data store such as a memory and providing them to a data processor is a problem that has been addressed in a variety of ways. Access to memory can be expensive, as measured by a variety of metrics. Dedicated local memory is fast, can provide a high bandwidth, is power efficient and readily available, but is costly in area. By contrast, on-die shared and/or arbitrated memory consumes more power and may not always be available or satisfy peak bandwidth requirements, but the cost of implementation is reduced. Off-chip memory is the cheapest, but suffers the largest penalty in power consumption, access latency and available bandwidth. Any limitation in memory access performance either due to latency or bandwidth limitations inflicts also a penalty in the processor efficiency, as it will stall when the required instruction is not available. All three issues regarding shared memory access: latency, bandwidth and power consumption can be mitigated by providing an intermediate data store. In some systems, a program instruction cache is provided such that the instructions can be accessed from within the cache. This provides fast access of the instructions, but has the disadvantage of being a reactive mechanism, which makes autonomous decisions on which instruction to store are based solely on the history of the instructions or instruction addresses being requested by the processor. To mitigate this disadvantage, caches are often equipped with complex prediction logic with the goal to maximise the probability of keeping the requested instructions in its local store, as a consequence such devices have high power consumption, thus caches are very power hungry. This can be a particular disadvantage where the program is long and many instructions are stored.
An alternative is to buffer the instructions prior to use in a FIFO buffer. This is cheaper than a cache but has less flexibility. This lack of flexibility means that instructions have to be moved more often which costs power and can also lead to stalling in the processor where an instruction is not available at the appropriate time.
Many of the above solutions are specific to particular architectures and thus, new solutions need to be designed for new systems.