Typically, a processor is capable of handling more than one instruction at a time. That is, the instructions are fetched into a cache. By placing the instructions in the cache in a particular manner, the processor may perform software pipelining to overlap loop iterations. A software-pipelined loop iteration is partitioned into stages with one or more instructions.
Software pipelined loops are scheduled with indirect loads by retrieving the data for load instructions from a memory into a cache prior to processing the load instructions (i.e., pre-fetch). If the data for the load instructions is not pre-fetched into the cache, the processor may stall by waiting for the data to be fetched from the memory into the cache. As a result, performance of the loop is reduced. Alternatively, the indirect loads may be software pipelined with the expectation that the data is in the cache. If the data is in-cache (i.e., without pre-fetch), the additional instructions for the pre-fetches of data, the address calculations, and the loads from the index array increase the cycles per iteration of the loop, which in turn, reduce performance of the loop. Typically before execution, the compiler of the processor may predetermine whether to pre-fetch the data of the indirect loads into the cache.
Therefore, a need exists to perform software pipelining without pre-fetching data for an instruction and regardless of whether the data is in either the memory or the cache.