In computer architecture applications, processors often use caches and other memory local to the processor to access data during execution. The processors more efficiently execute instructions when, for example, data accessed by a processor is stored locally in a cache. The problem is compounded when multiple caches (often having differing line sizes and timing requirements) of multiple processors are used together in a multiprocessor system. Processor stalls often occur, for example, when different processors attempt to access the same memory resources. Thus, an improvement in techniques for reducing stalls that are associated with processors sharing memory resources is desirable.
The problems noted above are solved in large part by a local memory arbiter that minimizes latency of memory accesses in a system having multiple processors. The disclosed memory arbiter improves overall system performance by managing the memory requests from each processor individually before those requests are sent to a central memory arbiter for handling memory requests for the shared resources from the multiple processors. The local memory arbiter buffers the memory requests from a local processor, analyzes the buffered memory requests, and optimizes the requests by reordering commands according to a rule set, and by performing write merging and prefetch squashing in certain conditions.