A digital signal computer, or digital signal processor (DSP), is a special purpose computer that is designed to optimize performance for digital signal processing applications, such as, for example, fast Fourier transforms, digital filters, image processing, signal processing in wireless systems and speech recognition. Digital signal processor applications are typically characterized by real-time operation, high interrupt rates and intensive numeric computations. In addition, digital signal processor applications tend to be intensive in memory access operations and require the input and output of large quantities of data. Digital signal processor architectures are typically optimized for performing such computations efficiently.
Microcontrollers, by contrast, involve the handling of data but typically do not require extensive computation. Architectures that are optimized for DSP computations typically do not operate efficiently as microcontrollers, and microcontrollers typically do not perform well as digital signal processors. Nonetheless, applications frequently require both digital signal processor and microcontroller functionality.
The characteristics of microcontroller data access patterns include temporal and spatial locality, which is ideally found in a cache. Specifically, the latency of memory operations is important, and common instruction sequences, such as load-compare-branch, need to be executed with a short latency. Otherwise, the branch misprediction penalty is large. Pointer chasing, where a load is performed to a register and the load is subsequently used to form an address for another load (commonly referred to as load-to-load interlock or pointer chasing), also needs to be executed with a short latency. This is because the second load, whose address is dependent on the first load, stalls for a longer time. In an in-order processor, a stall stops the entire machine without any useful work being done. Therefore, a microcontroller demands a short pipeline memory architecture.
Digital signal processors perform repetitive computations on large data sets. These large data sets may be accessed only once in the form of a load-compute-store sequence where the load and store are executed many times and are to different addresses. Temporal locality doesn't apply to these data sets, since data is not being re-accessed. Spatial locality applies in a limited sense in that data access patterns tend to be non-sequential stride based. These features make caches non-optimal for DSP applications, since caches have the undesirable overhead of cache fills and copybacks. In a cache fill, the memory operation which produced a cache miss stalls the entire processor, waits for the data to come from memory and then the fill data is written to memory. In a typical example, four cycles may be required to write back 32 bytes of data, during which time that particular bank of memory is not available to the processor. A similar situation applies to copybacks. If data is rarely reused, i.e., poor temporal locality, then there is no advantage in bringing a line of memory into the cache in view of sparse spatial locality.
In one prior art approach, the cache is provided with SRAM capability. If the cache is programmed as SRAM, then there is no refill and copyback overhead. However, the SRAM size is very small compared to the large data set typically used in DSP computations. The burden of managing overlays, the swapping in and out of data from a larger SRAM using DMA, must be done by software. Getting to this work correctly in performance sensitive applications may be very difficult.
Digital signal processor designs may be optimized with respect to different operating parameters, such as computation speed, power consumption and ease of programming, depending on intended applications. Furthermore, digital signal processors may be designed for 16-bit words, 32-bit words, or other word sizes. A 32-bit architecture that uses a long instruction word and wide data buses and which achieves high operating speed is disclosed in U.S. Pat. No 5,954,811, issued Sep. 21, 1999 to Garde. The disclosed digital signal processor includes three memory banks, each having a capacity of 2 megabits, connected by respective data buses to dual computation blocks. Notwithstanding very high performance, the disclosed processor does not provide an optimum solution for all applications.
Memory latency is frequently a limiting factor in achieving enhanced processor performance. Because digital signal processor computations tend to be intensive in memory access operations, memory systems are critical components of high performance digital signal processors. Accordingly, there is a need for further innovations in memory systems for digital signal processors.