In signal processing systems, fetching instructions and data from memory is often a slow process compared to the operating frequency of the master device that has initiated the fetch instructions. Consequently, if the system is running a large number of such fetch operations, it can cause a significant decrease in the overall system performance. As central processing unit (CPU) clock rates increase, and technologies such as multi core become more prevalent, system on chip (SoC) processing performance is being increasingly limited by memory bandwidth due to memory access speeds improving across technologies at a much slower rate than CPU clock speeds.
It is known to implement prefetching schemes, whereby data and/or instructions are fetched in advance of the master device initiating a fetch request for them. As a result the performance impact of accessing relatively slow memory elements may be reduced. Known prefetching schemes store fetched information within buffers, with the content of buffers being replaced by newly fetched information based on a replacement strategy such as, for example, on a least recently used (LRU) basis.
A problem with such known replacement strategies is that, whilst they are adequate for substantially linear program flow and tight loops within the program flow, they do not take into account long span changes of program flow. Accordingly, for application code comprising a relatively large proportion of long span changes of program flow, such known replacement strategies are not an effective means of buffering information.