Modem data processors achieve much of their high performance by dispatching and executing more than one instruction at the same time, "superscalar" instruction execution. The implication of this design strategy is that these data processors must be able to fetch more than one instruction each clock cycle from their memory storage subsystem. Typically, high performance data processors buffer their instructions in an integrated memory cache for quick access.
Memory caches are often designed to output a group of sequential bytes when provided with an address of any byte in the group. These byte groups are referred to as "cache lines." This cache architecture is a compromise between data bandwidth, source address tagging, etc. For instance, a four-instruction-dispatch data processor may have a memory cache designed as a series of sixty-four byte cache lines. Each cache line could contain sixteen thirty-two bit (four-byte) instructions. Such a cache supplies an entire cache line whenever the data processor requires any single instruction in the cache line. However, a four-instruction-dispatch data processor only requires the four instructions beginning at the provided address each clock cycle. The first requested instruction may or may not be the first instruction in the output cache line. Therefore, such a data processor requires additional circuitry to select the four requested instructions within each output cache line. This additional circuitry slows the process of providing instructions for execution and is oftentimes a large circuit, raising the cost of the data processor.