1. Field of the Invention
The present application relates generally to a computer system and more particularly to a system that forwards data traffic.
2. Related Art
FIG. 1 shows a prior art single-processor system that includes a processor that issues memory access requests, a storage unit to store data, and a controller to interface with the storage unit. The processor provides the address within the storage unit from where the applicable data should be fetched (e.g., these addresses are represented by “A0”, “A1”, “A2”, and “A3”). The controller returns the data fetched from the address within the storage unit (e.g., the data is represented by “D0”, “D1”, “D2”, and “D3”). Even though the latency to access the storage unit varies depending on, for example, the level or memory type at which the data is stored (e.g., the data from address “A2” may be fetched much earlier than the data from address “A1”), the controller orders the sequence such that data is returned to the processor in the same order that the processor issued the memory access request (e.g., since “AO” was the first memory access request issued, then “D0” is the first data that's returned to the processor). This is done to ensure that each data finds its destination.
FIG. 2 shows a prior art multi-processor system that includes the multiple processors 1 to N, an arbiter, the controller, and the storage unit. Compared to the single-processor system, the multi-processor system allows processors to share data stored within the storage unit and better utilizes the storage unit. The arbiter processes multiple memory access requests from the multiple processors and maintains records of the processor making the request and the sequence that the memory access requests were issued, and when the data is available from the storage unit, the arbiter checks the records in order to dispatch the data to the processor that requested the data. The arbiter may also provide for out-of-order memory access request execution in which case the arbiter has to perform more record keeping in order to track the original order of the memory access requests so that the data can be provided to the appropriate processor in the correct order. Out-of-order execution means that the arbiter, for example, may schedule a second memory access request earlier due to its higher priority than a first memory access request even though the first memory access request was issued earlier by the processor.
This centralized arbiter is not scalable (e.g., this arbiter is application specific so if it's designed for a single-channel memory system then it cannot be used in a 2-channel memory system). In addition, the centralized arbiter scheme faces more challenges when it supports multi-channel subsystems. The single-channel memory system has only one memory space that can be accessed at any given time. The multi-channel memory system has multiple memory spaces and each of these multiple memory spaces are accessed independently of each other. Because the centralized arbiter faces both multiple request sources and multiple data channels (e.g., multiple memory channels), the centralized arbiter scheme is much more complex and may result in a large chip size and/or circuit timing penalty.
For the foregoing reasons, it is desirable to have a distributed multi-processor out-of-order execution to access a storage unit and forward data traffic.