Various processor designs include coprocessors that are intended to accelerate execution of a given set of processing tasks. Some such coprocessors achieve good performance/area in typical processing tasks, such as scaling, filtering, transformation, sum of absolute differences, etc., executed by a digital signal processor (DSP). However, as the complexity of digital signal processing algorithms increases, processing tasks often require numerous passes of processing through a coprocessor, compromising power efficiency. Furthermore, access patterns required by DSP algorithms are becoming less regular, thereby negatively impacting the overall processing efficiency of coprocessors designed to accommodate more regular access patterns. Consequently, processor and coprocessor architectures that provide improved processing, power, and/or area efficiency are desirable.