A computer processor system may include one or more processor units for performing processing operations. Each of these processor units may request access to memory (e.g. to read or write data) as part of executing instructions to perform one or more processing operations. Each memory access request may specify a memory address identifying a region of memory to be accessed. In this context, a processor unit could for example be a processor, processor core, a multi-core processor, or some other type of unit capable of executing instructions to perform one or more processing operations, such as a digital signal processor (DSP). The computer processor system could for example be a central-processor unit (CPU), a graphics processing unit (GPU).
It is common for computer processor systems to be arranged so that multiple processing operations can be performed in parallel. For example, some processor units are capable of executing multiple threads in parallel. In other examples, a computer processor system may include multiple processor units operating in parallel, each of which may execute a single thread, or multiple threads in parallel. As a consequence, a computer processor system may generate a number of memory access requests; in some systems, multiple memory access requests may be generated in a single clock cycle. In other cases, multiple memory access requests may be generated over one or more clock cycles.
To reduce the latency in the operation of the computer processor system, multiple memory accesses may be made in parallel (e.g., a specified number of memory addresses across one or more blocks of memory may be accessed in parallel). Parallelising the memory accesses may be particularly convenient when the access requests reference memory addresses within a block, or blocks, of memory not local to the processor units. For example, if the one or more processor units were implemented as part of a system-on-chip (SoC), one or more blocks of memory that can be accessed by the processor unit(s) may be located off-chip, for example to reduce the size of the chip.
The number of memory accesses that can be made in parallel may be restricted to a specified maximum value. This value may for example be limited by data bandwidth. For instance, if the processor unit(s) form part of a SoC, the rate at which data can be communicated on and off the chip may be limited by the data bandwidth limit of the memory bus used to transfer data to/from memory.
In some cases, the number of pending memory access requests may exceed the maximum number of memory accesses that can be made in parallel. Furthermore, some of the pending memory access requests may not be unique, for example the pending memory access requests may contain multiple requests to access the same memory address. Under these circumstances, a set of parallel memory access requests may contain multiple requests for the same memory address, resulting in an inefficient memory access scheme.