In many fields and applications, a control processor (e.g., central processing unit (CPU)) shares a memory with multiple devices via a memory controller. The CPU may, for example, handle interrupts, manage other functional resources and interact with users. To perform these tasks in a timely manner, the execution speed of the CPU is a substantial factor with respect to the overall system performance. Memory latency, in turn, is a substantial factor with respect to the execution speed. Unlike media processors, for example, that access memory in long data streams, the CPU may tend to access short streams of sequencing addresses. It is difficult to build a shared memory system that satisfies these different types of requests. Thus, the memory latency of the CPU may be long (e.g., tens of cycles) even if the memory bandwidth is high.
One solution to the memory latency problem employs the technique of prefetching. Prefetching may include, for example, loading particular data to storage close to the CPU in anticipation that the CPU may use the data in the near future. However, the coverage and accuracy of a particular prefetching scheme can vary with different programs and applications. In addition, the effectiveness of a particular prefetching scheme can even vary with respect to the memory region being accessed by the CPU. In fact, there are some circumstances in which a particular prefetching scheme would be more effective if it were turned off. However, conventional prefetching schemes and controls may not be changed in real time (i.e., on the fly) to accommodate dynamic environments.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.