Embedded processors, such as those used in wireless applications, may include a digital signal processor, a microcontroller and memory on a single chip. In wireless applications, processing speed is critical because of the need to maintain synchronization with the timing of the wireless system. Low cost, embedded processor systems face unique performance challenges, one of which is the constraint to use low-cost, slow memory, while maintaining high throughput.
In the example of wireless applications, a digital signal processor (DSP) is often employed for computation intensive tasks. In this system, low-cost, off-chip flash memory forms the bulk storage capacity of the system. However, the flash memory access time is much longer than the minimum cycle time of the digital signal processor. To achieve high performance on the DSP, it should execute from local memory which is much faster than the off-chip flash memory.
Embedded processor systems may implement the local memory with some form of fill-on-demand cache memory control instead of or in addition to simple RAM, which requires another processor or a direct memory access (DMA) controller to load code and/or data into the local memory prior to or after the processor requires the code and/or data.
When the DSP encounters a cache miss, the cache hardware must fill a cache line from the slower memory in the memory hierarchy. This fill-on-demand aspect of the cache often means that the DSP is stalled while all or part of the cache line is filled.
Accordingly, there is a need for methods and apparatus for improving the throughput of cache-based embedded processors.