A processor typically employs a cache to store data that is expected to be accessed by an instruction pipeline in the near future. As the processor executes program threads, the entries of the cache are filled from other levels of a memory hierarchy responsive to memory access requests. The memory access requests can be either of two different types: demand requests, representing requests for data (e.g. instructions) known to be needed by an executing program thread, and prefetch requests, representing requests for data that is predicted by a prefetcher to be needed by the executing program thread in the near future. The memory access requests are sent to a cache controller, which determines whether each memory access request can be satisfied at the cache, or must be satisfied from another level of a memory hierarchy, resulting in an entry of the cache being filled with the requested data.
In some processors, multiple demand and prefetch requests can concurrently be available to be sent to the cache controller for processing. However, the cache typically has a limited number of ports to receive memory access requests, such that the cache controller cannot concurrently service all of the pending demand and prefetch requests. Therefore, a processor can employ an arbiter that controls which memory access request is provided to each port of the cache controller during each access cycle. The arbiter typically employs a simple arbitration scheme to select among the pending memory access requests, such as a round-robin scheme whereby a memory access request for a different executing program thread is provided to the cache each access cycle. However, such arbitration schemes may not make efficient use of the cache, reducing instruction throughput and consuming unnecessary power.
The use of the same reference symbols in different drawings indicates similar or identical items.