1. Field of the Invention
The invention relates to network devices. More particularly, this invention relates to a system, method, and apparatus for determining that credits in an end-to-end credit networking system are correctly transferred and, when they are not, accounting for the mismatched credits to mitigate network interruptions.
2. Description of the Related Art
In a typical closed loop credit system, the system generally insures that no data units are lost due to congestion or processing. However, these systems are not immune from problems, such as line errors on the media for transmitting data between sender and receiver devices. Take, for example, the typical closed loop credit system 100 illustrated in FIG. 1. The network 100 is comprised of node 102, which is connected to device 104 (e.g., a switch). Device 104 is connected to device 106 (e.g., a switch), which in turn is connected to device 108 (e.g., a switch). Device 108 is connected to node no. For purposes of illustration, packets flow from node 102 to node no in this example. To maintain efficient traffic flow, devices 104 and 108 monitor end-to-end credits between one another. Device 104 will initially have a number of starting credits, referred to as “initialized_credits.” Device 104 will also have a “tx_credits” counter value 112, which indicates how many credit units are available to be sent from device 104. In this example, “tx_credits” 112 has a value of “1”, because all but one of the “initialized_credits” is outstanding. The number of outstanding credits available is calculated by decrementing the “tx_credits” value 112 by one for each unit of data sent from device 104. This decrementing continues for each credit sent until the transmit credits are exhausted (i.e., “tx_credits” equals “0”), at which time device 104 must cease transmitting any further data units.
Device 108 includes sufficient allocated storage resources to store all the data units that device 104 is granted to send to device 108, which again is based upon the “initialized_credits” value. After a data unit arrives at device 108, device 108 stores the data unit as needed until it can dispatch the data unit to node no and recover the storage space occupied by the data unit. Only after dispatching the data unit to node no will device 108 return a credit 114 back to device 104. Device 104 then uses the returned credit 114 to increment “tx_credits”, thereby allowing device 104 to send an additional data unit according to the exact same process.
The above depiction and the following embodiments are simplified by only illustrating unidirectional data flow, even though both devices 104 and 108 may have send and receive functions to allow full-duplex operation with bi-directional data flow and signaling.
As previously stated, this system is not immune from errors, predominately due to line errors on the media between sender and receiver. Such errors cause two classes of problems. The first class of problems may be referred to as “loss of credits,” which is any problem that causes the total credits in the system to be lower than expected. Such errors cause reduction of throughput, or zero throughput in a worst case scenario. This can happen in two circumstances: (1) a credit return message is corrupted and not recognized by device 104; and/or (2) data units are lost or reduced in size as they travel across the path between device 104 and 108.
The second class of problems may be referred to as “excess credits,” which is any problem that causes the total credits in the system to be greater than expected. Such errors create a buffer overflow at device 108. Such a buffer overflow situation may occur when: (1) framing errors cause the data unit size to increase, or spurious data units to appear at device 108; or (2) mutation of signaling causes spurious credit returns to appear at device 104.
The typical method to detect a change in total system credits is to acquiesce all traffic for a sufficient time so that all data units are allowed to be dispatched and all credit returns are allowed to arrive back to device 104. Under this method, the “tx_credits” value should return to the “initialized_credits” value in the absence of any errors. However, an interruption in service is required to perform this checking method, and such interruptions in service are generally unacceptable.
The Fibre Channel (“FC”) protocol defines a scheme, which is fully described in the FC standards document FC-FS3, Section 19.4.9., whereby the sender and receiver utilize a checkpoint system to identify every Nth data unit or credit (respectively). If the peer detects an error upon arrival of the Nth data unit or credit, adjustments can be made to correct any credit discrepancy. This scheme is complex in that it requires both sender and receiver to actively detect and manage the recovery of unidirectional data flow. Additionally, there are other complications that result from the potential corruption of the checkpoint signal itself. As such, the need exists for an improved system, method, and apparatus for verifying the accuracy of end-to-end credit systems and improving credit recovery when those systems yield errors.