Systems on a chip (SOCs) comprise many system clients, each contending for memory resources. Clients include multi-threaded/multi-issue central processing units (CPUs), high-end graphics processing units (GPUs), video encode/decode engines, and audio subsystem, etc. A design challenge for SOCs is to ensure that each client's memory requirements are satisfied, while also producing the most power efficient design. One design consideration is memory latency or the time a client would wait for requested data from memory. Memory latency affects the overall system performance. Many CPU cycles may be wasted waiting for a data or instruction fetch, which in turn waste power and lowers processor utilization. Some designs address this challenge by increasing the processor clock rate, increasing cache sizes, adding an additional processor, and/or use a predictive scheme. However, these approaches either increase power consumption, increase silicon area impact, and/or increase the overall cost of the system or both.