Cores of higher-performance general-purpose processors and DSPs have an instruction fetch interface, or “front end,” in which instructions are speculatively prefetched from memory and placed inside an instruction prefetch buffer (or simply a “prefetch buffer”) internal to the core to await decoding. In general-purpose processors, the prefetch buffer is almost always implemented as a first-in first-out (FIFO) buffer in which newly fetched instructions are stored, or “pushed,” into a head of the FIFO and later removed, or “popped,” from a tail of the FIFO as they begin execution.
However, DSPs are often called upon to execute loops more than are general-purpose processors and therefore tend to be optimized for that purpose. (A loop is defined for purposes of this disclosure as a body of instructions that is executed multiple times, for example, to implement a digital filter.) As stated above, instructions are popped from a FIFO as they begin execution, which means the same instructions have to be fetched over and over again when executing a loop, which wastes power. For this reason, the prefetch buffer of a DSP is rarely if ever implemented as a FIFO.
A more energy efficient solution is to implement the prefetch buffer as a direct-mapped cache. In a direct-mapped cache, a set of contiguous instructions in memory (known as a cacheline) may only be placed in one of N possible locations in the cache. This cacheline remains resident in the cache until it is overwritten with another cacheline that maps or “aliases” to the same location as the resident cacheline.
A loop is said to be “small” when its body fits inside the prefetch buffer. When the loop body is larger than the prefetch buffer, cachelines need to be replaced during execution of the loop. For instance, if the prefetch buffer has eight lines, and the loop body requires 12 lines, four lines of the loop body need to be replaced for every iteration of the loop. When the loop body is more than twice the size of the prefetch buffer, the energy efficiency advantage inherent in the direct-mapped cache disappears, and it begins to function like a FIFO buffer. Repeated exchanging of cachelines is called “thrashing” and wastes power. For this reason, prefetch buffers are typically sized such that loops expected to be encountered in routine operation are small and therefore can be executed without replacing cachelines.