When an SOC has multiple DRAM interfaces for accessing multiple DRAMs in parallel at differing addresses, each DRAM interface can be commonly referred to as a memory “channel”. In the traditional approach, the channels are not interleaved, so the application software and all hardware blocks that generate traffic need to make sure that they spread their traffic evenly across the channels to balance the loading. Also, in the past, the systems use address generators that split a thread into multiple requests, each request being sent to its own memory channel. This forced the software and system functional block to have to be aware of the organization and structure of the memory system when generating initiator requests. Also, in some super computer prior systems, the system forced dividing up a memory channel at the size of burst length request. Also, in some prior art, requests from a processor perform memory operations that are expanded into individual memory addresses by one or more address generators (AGs). To supply adequate parallelism, each AG is capable of generating multiple addresses per cycle to the multiple segments of a divided up memory channel. The memory channel performs the requested accesses and returns read data to a reorder buffer (RB) associated with the originating AG. The reorder buffer collects and reorders replies from the memory channels so they can be presented to the initiator core.
In the traditional approach, the traffic may be split deeply in the memory subsystem in central routing units, which increases traffic and routing congestion, increases design and verification complexity, eliminates topology freedom, and increases latencies. The created centralized point can act as a bandwidth choke point, a routing congestion point, and a cause of longer propagation path lengths that would lower achievable frequency and increase switching power consumption. Also, some systems use re-order buffers to maintain an expected execution order of transactions in the system.
In the typical approach, area-consuming reorder buffering is used at the point where the traffic is being merged on to hold response data that comes too early from a target.