A digital signal computer or digital signal processor (DSP) is a special purpose computer that is designed to optimize performance for digital signal processing applications, such as, for example, fast Fourier transforms, digital filters, image processing and speech recognition. DSP applications are characterized by real-time operation, high interrupt rates, and intensive numeric computations. In addition, DSP applications tend to be intensive in memory access operations and to require the input and output of large quantities of data. Thus, designs of DSPs may be quite different from those of general purpose processors.
One approach that has been used in the architecture of DSPs is the Harvard architecture, which utilizes separate, independent program and data memories so that two memories may be accessed simultaneously. This permits instructions and data to be accessed in a single clock cycle. Frequently, the program occupies less memory space than data. To achieve full memory utilization, a modified Harvard architecture utilizes the program memory for storing both instructions and data. Typically, the program and data memories are interconnected to the core processor by separate program and data buses.
When instructions and data are stored in the program memory, conflicts may arise in the fetching of instructions. Further, in the case of Harvard architecture, the instruction fetch and the data access can take place in the same clock cycle, which can lead to a conflict on the program memory bus. In this scenario, instructions which can generally be fetched in a single clock cycle for a case can stall a cycle due to conflict. This happens when the instructions fetch phase coincides with the memory access phase of a preceding load or store instruction on the program memory bus. Such instructions are cached in conflict cache so that next time when the same instructions are encountered, it can be fetched from the conflict cache to avoid the instruction fetch phase stalls. In addition to the conflict cache, traditional instruction cache is also required for fetching instructions from the external main memory. This results in requiring two different cache architectures.
Further, conventional instruction cache architectures exploit the locality of code to maximize cache-hits. Most of the cache architectures suffer from performance degradation due to cache thrashing, i.e., loading the cache with instruction and then removing it while it is still needed before it can be used by the computer system. Cache thrashing is, of course, undesirable, as it reduces the performance gains.
Conventional techniques reduce cache thrashing by increasing the cache size, increasing cache-associativity, having a victim cache, and so on. However, these techniques come with overheads like extra hardware, increased cache hit access time, and/or higher software overhead. Another conventional technique identifies frequently executed instructions after code-profiling and locking the cache through software to minimize cache thrashing. However, this technique requires additional overheads in terms of requiring profiling of code by user and extra instructions in the code to lock the cache. Further, this can make the code very cumbersome.