A Reduced Instruction Set Computer (RISC) enjoys improved throughput by having the Central Processing Unit (CPU) receive, temporarily store, and output data to registers rather than memory addresses. Registers can be placed physically closer to the CPU than a memory can, and they operate faster as well. The only permitted interactions with the memory are LOAD (data from the memory into a register) and STORE (data from a register into the memory). It is this dramatically reduced number of instructions which gives RISC its name.
Many applications, particularly digital signal processing (DSP), require intensive access to the memory. A typical instruction, such as MAC (multiply and accumulate, i.e., multiply two numbers together and add the product to a third number) can require several memory accesses. The prior RISC art has been to load the multiplicand into a first register (one clock cycle), load the multiplier into a second register (a second clock cycle), and multiply the two numbers together and add the product to the contents of a third register and replace the sum back into the third register (a third clock cycle). All this calculation can go on in the same amount of time as was needed to get a single number from the memory. This disparity has fueled enhancements to the bare RISC architecture.
Pre-fetching can keep the MAC apparatus busy all the time, rather than only a third of the time, if the address from which the data is to be loaded is known several clock cycles in advance. If long runs of consecutive addresses are to be accessed, it takes relatively few cycles to specify the initial address, whether every address or only every other address (or every third address) should be accessed, etc. If multiple short runs are to be accessed, however, the overhead involved in setting up each run can become prohibitive. A processor with multiple execution units can keep the MAC apparatus busy, but multiple units (or, worse, multiple processors) are multiply expensive. A computational unit other than a MAC has a similar problem. The present invention provides a low cost alternative to the problem of keeping the computational unit busy.