It is the nature of the computer system industry to require an exponential performance advantage over the generations while maintaining or decreasing system costs. In particular, telecommunications and networking systems benefit from a reduction in system size and an increase in capabilities.
Computer system processors and peripherals continually benefit from the aforementioned generation over generation performance advantage. In order to realize a proportional system wide improvement in performance, the connection fabric between devices must improve along with the improvements in processors and peripherals.
A hierarchy of shared buses is a common fabric structure. Levels of performance required for the multiple devices in the system typically differentiate this hierarchy. Bus bridges connect the various buses. In this structure a low performance device does not burden a high performance device.
Providing a wider bus, increasing the bus frequency, pipelining the transactions on the bus, or completing the transactions in an out of order manner can provide additional performance. However, these techniques are well known, and further refinement results in diminishing returns. Further increases in bus width will reduce the maximum possible frequency due to skew effects i.e. as the data-path is altered to include a greater number of data bits, the skew, between those individual bits, originating in the transmission medium, becomes increasingly severe. A wider bus will also increase pin count. This will affect cost, and limit the interfaces on a device. Furthermore, the maximization of frequency and width is incompatible with a multi-device connection. Finally, it would be advantageous to increase the number of devices capable of direct communication.
Therefore, a point to point, packet switched, fabric architecture is displacing traditional memory mapped bus architecture for use in network equipment, storage subsystems and computing platforms capable of providing an interface for processors, memory modules and memory mapped I/O devices.
Modern digital data networks are increasingly employing such point to point, packet switched, fabric interconnect architectures to overcome bandwidth limitations. These networks transmit encapsulated address, control and data packets from the source ports across a series of routing switches or gateways to addressed destinations. The switches and gateways of the switching fabric are capable of determining from the address and control contents of a packet, what activities must be performed.
An efficient packet switching network will strive to meet certain characteristics. In general high throughput is desirable. Throughput is a node-oriented measure of the rate of packet processing. Low latency is another positive characteristic. Latency is a packet-oriented measure of the duration of processing for packets at a node. Latency is a negative characteristic of system performance: Entirely aside from throughput, it is desirable to limit the latency of individual packets. Additionally a network should be fair i.e. it should not unduly favor one port over others in the system. However, an efficient system will respond to the difference in traffic types, if special needs exist, in order to meet those needs.
Certain practices in the art are at odds with some of these goals:                In the absence of proper management, it is possible for devices attached to a network to control bandwidth by using more than their fair share. This can result in unfairness. In the case of multiple unfair devices, queueing delays may get long, traffic classes that require low latency experience long latency and this results in dropped packets and poor signal quality for isochronous traffic.        Where multiple traffic flows compete for access to a resource, additional latency can be introduced. Traditionally solutions result in unfair, or asymmetrical access or access vulnerable to ‘bandwidth-hogging’.        In the absence of a system isolation solution, a defective device may also become a rogue transmitter, functioning as a ‘bandwidth hog’. The. resulting utilization can impact the latency of legitimate communications. Traditional methods would have to rely on the integrity of the communications channel, or the introduction of a dedicated back channel. Existing routing control access is flat, and does not allow for articulation of access by a port.        In First in First out (FIFO) oriented nodes, there may not be a facility to route traffic, based on the needs of that traffic.        
The latency of a packet through that buffer will increase with buffer utilization i.e. as a buffer fills, the delay associated with passing that buffer rises. Some types of traffic (e.g. Voice, Video) are particularly sensitive to packet latency. Identifying and routing latency sensitive packets at high utilization nodes could reduce the detrimental effects of latency.                In FIFO oriented nodes, there may not be facility to promote traffic based on a traffic stall. Packets may block the head of buffer in a ‘cannot proceed’ condition while packets capable of communication wait behind. Traditionally the packets must simply wait for a proceed condition in the lead packet. This introduces unnecessary latency. Algorithms more sophisticated than FIFO while maintaining compatibility with FIFO-like standards may correct this deficiency in traditional systems.        Finally, certain features are absent in existing systems due to the cycle overhead they would require. Ancillary circuits need to be provided for without impacting the throughput and latency of the system. Any additional functional circuitry (e.g. debug) may increase latency, depending on its design. Prior debug ports for instance might fail to simultaneously mirror the output, introduce latency, or alternatively, force a reduction in clock speed and therefore throughput.        
The end points of packet switched architecture contain packet buffers, which are traditionally FIFO memories. These buffers can be a focal point for improved management thus addressing the aforementioned deficiencies of the art.
There is a need to address all the abovementioned circumstances, and furthermore, a need to do so in an efficient manner, using minimal additional circuitry, and most importantly, adding little or no clock overhead to the operation of the buffers.
What is needed is a buffer management system that will have the greatest positive effect on throughput, latency and fairness, and in a manner supportive of ancillary functions.