In many fields and applications, a control processor (e.g., central processing unit (CPU)) shares a memory with multiple devices via a memory controller. The CPU may, for example, handle interrupts, manage other functional resources and interact with users. To perform these tasks in a timely manner, the execution speed of the CPU is a substantial factor with respect to the overall system performance. Memory latency, in turn, is a substantial factor with respect to the execution speed. Unlike media processors, for example, that access memory in long data streams, the CPU may tend to access short streams of sequencing addresses. It is difficult to build a shared memory system that satisfies these different types of requests. Thus, the memory latency of the CPU may be long (e.g., tens of cycles) even if the memory bandwidth is high.
One solution to the memory latency problem employs the technique of prefetching. Prefetching may include, for example, loading particular data to storage close to the CPU in anticipation that the CPU may use the data in the near future.
In one conventional system, the CPU includes a level two (L2) cache. Such an approach may be costly and may negatively impact CPU performance. The L2 cache typically accommodates large line sizes and, as a result, may be quite large in size and may necessitate a large cache bandwidth. When the CPU accesses line x, the L2 cache control may prefetch the next sequential cache line x+1 into the L2 cache. Fetching more cache lines into the L2 cache may also increase capacity and bandwidth requirements. Furthermore, conventional L2 caches tend to keep cache lines that have already been accessed by the CPU under the assumption that the CPU may access the same cache lines in the future. Thus, conventional L2 caches are necessarily large in size and typically take up the most space in the CPU.
Another conventional system employs a stream buffer next to the caches of the CPU. Such a system may be complicated and slow. Each stream buffer is a first-in-first-out (FIFO) storage of a fixed number of cache lines to hold a stream with sequentially increasing address of CPU data. Typically, multiple stream buffers are used. The stream buffers are adapted, for an access, to perform a search of all buffers and all entries of each buffer. Additionally, the stream buffers are adapted to shift the buffer entries to maintain the FIFO structure. However, these adaptations may limit the overall capacity of the stream buffer and access time. Moreover, the stream buffer may not inherently be able to store data streams that are striding in reversed order.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.