1. Field of the Invention
The present invention generally relates to exchanging data on a bus between multiple devices using a plurality of virtual channels and, more particularly to dynamic adjustment of credits used to allocate bandwidth of the bus to the virtual channels.
2. Description of the Related Art
Modern computer systems typically contain several devices in communication with each other across a system bus. A computer system may contain a central processing unit (CPU), a graphics processing unit (GPU), and a memory controller in communication with each other across the system bus. The CPU may contain one or more integrated processor cores, some type of embedded memory, such as a cache shared between the processors cores, and peripheral interfaces, such as an external bus interface, on a single chip to form a complete (or nearly complete) system on a chip (SOC). The external bus interface is often used to pass data in packets over an external bus between these systems and the other devices in the computer system. The external bus interface is typically shared between the processor cores of the CPU which may pass data to and from the interface over an internal bus as streams of data, commonly referred to as virtual channels.
The GPU may send and receive data to/from the CPU using similar virtual channels. Data received by the GPU may be stored in a receive buffer before being processed by the GPU processor core(s). Receive buffer space on the GPU may be allocated among each of the virtual channels receiving data from the CPU. A virtual channel may be allocated more or less buffer space depending on the expected workload for that virtual channel. However, if too many packets are sent across a virtual channel, the receive buffer for that virtual channel may be filled up and overflow, causing packets for that virtual channel to be dropped.
To ensure that the CPU does not send too many data packets on any one virtual channel, which may cause a receive buffer overflow, a credit-based flow control protocol may be utilized, whereby a receiving device communicates a flow control credit limit (FCCL) to the transmitting device. One such credit-based flow control protocol is described in the Infiniband™ Architecture Specification, Vol. 1, Release 1.1 (subchapter 7.9), incorporated herein by reference in its entirety. According to this protocol, the receiver may calculate the FCCL as a sum of the amount of receive buffer space remaining and an adjusted packets received (APR) parameter. Due to lost packets or packets received with bad checksums, neither of which result in the consumption of buffer space, APR may not match the total packets sent (TPS) by the transmitter.
The transmitting device may use the FCCL to calculate a conservative estimate of the amount of buffer space available on the receiver by subtracting the total packets sent (TPS) from the FCCL. This estimate may be considered conservative because the total packets sent (TPS) may be greater than the adjusted packets received (APR) which was used to calculate the FCCL, resulting in a free space estimate that errs on the low side. In any rate, this estimated value is used to ensure the receive buffer does not overflow. As long as this estimated available buffer space is greater than zero, the transmitting device may continue to send packets. If this estimated buffer space is not greater than zero, the transmitting device may wait until it receives a control packet from the receiving device with and FCCL that results in an estimated buffer space that is greater than zero (as the receiving device processes packets from the receive buffer, the free space increases and FCCL will grow).
In order to periodically synchronize the adjusted packets received (APR maintained on the receiver) with the actual total packets sent (TPS), the transmitter may periodically transmit a control packet to the receiver that contains TPS. This synchronization should serve to reclaim some of the buffer space effectively lost as a result of lost packets (by overwriting APR with TPS, the FCCL calculated by the receiver will increase). Control packets are typically sent over separate virtual channels, such that they do not result in consumption of buffer space for the corresponding virtual channel used for data packets. Other non-data packets used to synchronize the communication link between the transmitter and receiver may also be sent, which do not affect the receive buffer.
Because the CPU and GPU may have different processing requirements, the CPU and GPU may be clocked at different speeds. Thus, the CPU, which may control the entire computer system, may be clocked faster than the GPU. In some cases, the GPU may not be clocked as fast because it may utilize less expensive technology that runs at a slower clock speed. To account for differing clock speeds between the CPU and the GPU, the GPU may process data using an internal bus having a different dimension (e.g., a larger byte size) than the CPU internal bus used to carry the data packets. As an example, the CPU may send packets across an internal bus having an eight byte bus width, and because of the higher transmission rate of the CPU, the GPU may receive the packets on an internal bus having a sixteen byte bus width.
In such cases, methods and systems for credit based flow control between devices capable of transmitting and receiving data on internal busses having different widths are needed.