1. Field of the Invention
Embodiments of the present invention relate generally to parallel processing, and, more specifically, to a method and apparatus for equalizing a bandwidth impedance mismatch between a client and an interface.
2. Description of the Related Art
A modern computer system may be implemented with a processor that executes many operations in parallel known as a parallel processing unit (PPU). PPUs generally include one or more engines (or clients), that perform operations such as memory management, graphics display, instruction fetching, encryption, and other operations.
Clients often write data to and/or read data from system memory, peer parallel processor (PP) memory (memory associated with peer PPUs) and/or local PP memory. In doing so, clients issue write and/or read transactions that target system memory, peer PP memory, and/or local PP memory via a crossbar (x-bar). The x-bar is coupled to system memory and peer PP memory via a system interface and to local PP memory via a local interface. The system interface transports transactions at a certain rate, referred to as the “system transaction rate,” while the local interface transports transactions at another rate, referred to as the “local transaction rate.” Typically, the system transaction rate is much lower than the local transaction rate.
Some clients may issue write transactions to system memory and/or peer memory at a rate that is much higher than the system transaction rate. When such a client issues excessive write transactions, the write transactions can accumulate within the system interface and then spill into the x-bar. A consequence of this situation is that the rate at which the x-bar can transport transactions is reduced to the system transaction rate. The x-bar can then only transport transactions that target local PP memory at the system transaction rate. Since the system transaction rate is much lower than the local transaction rate, as described, the accumulated transactions within the x-bar effectively reduce the local transaction rate to be equal to the system transaction rate. This situation causes problems for certain clients.
Specifically, some clients require transactions to be transported to local PP memory at the local transaction rate. When the local transaction rate is reduced to the system transaction rate, those clients may be stalled and the throughput of the PPU may be reduced.
Accordingly, what is needed is a technique to avoid the accumulation of write transactions within the system interface when clients issue write transactions at a rate that exceeds the system transaction rate.