Current general multi-core computer architectures include multiple cores connected to a shared memory controller. As shown in FIG. 1, a general multi-core architecture 100 with N processor cores 102 share a memory that includes M memory banks 110. Requests from each core 102 are sent to the memory controller 112 first, which then arbitrates with core arbitration unit 104 and in turn, issues requests to the memory. The memory controller is divided into three main parts: the core arbitration unit 104, bank queues 106, and an access scheduler 108. Because the memory controller 112 has parallel access to all M memory banks 110, a bank queue 106 is used for each individual bank request. These bank queues 106 are served every memory clock cycle and the acknowledgement with data (in the case of a read) is sent back to the processor 102.
In the scenario where multiple cores 102 request access to the memory locations which belong to the same bank 110, the memory controller 112 places these requests in the respective bank queues 106. The cores 102 may be central processing unit (CPU)/digital signal processing (DSP) cores, hardware accelerators or any master processor that can initiate read/write access to the memory. This contention between cores 102 to access the same bank 110 is known as a bank conflict. The bank conflicts mean that the requests will be served sequentially and that some of the cores 102 will have to wait longer for their request to execute. As the number of bank conflicts increase, the latency for memory accesses to the bank 110 increase, thereby increasing the latency for the entire system 100. Therefore, a new method, system, and architecture for improved memory access with decreased latency are desirable.