Modern microprocessors comprise many resources such as registers and memory for which various requesters contend. For example, a typical microprocessor has various layers of memory including local instruction and data caches, lower level backup caches (B-cache), main memory, storage devices such as disk drives, and storage available over a network. Generally, these layers form a hierarchical set of memories where at one extreme, the local caches are very fast but tend to be small, whereas at the other extreme, main memory, disk storage, and network storage tend to have very large capacities, but access times are also magnitudes larger than for local memory.
To reduce the impact of the long latency between processor and main memory or other long latency storage, hardware prefetching techniques are utilized. The CPU predicts which memory blocks it will utilize next and requests those blocks from the memory hierarchy before they are actually needed.
Because it cannot be predicted with certainty which branch a branch instruction in the instruction stream will take, these prefetched instructions are said to be speculative. Unfortunately, these prefetches can consume bandwidth to the lower level caches, main memory, etc., reducing the amount of available bandwidth for memory transactions that are needed as soon as possible, for example, for demand misses and cache-victim processing.